Closed adithram closed 6 years ago
May you post the content of unknown_labels
and known_labels
.
Also, what would happen if the query strategy is changed to other query strategy? Would the same error raise again?
Thanks.
known_labels:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
unknown_labels is a massive list of nonetype
objects
By using RandomSampling
, I have no issues with the make_query()
function, although, I do have to continue to debug the following error with theideal_labeler
:
Traceback (most recent call last): File "active-learning.py", line 145, in <module> lbl = ideal_labeler.label(combined_dataset.data[query_id][0]) File "/Users/ytk326/anaconda2/lib/python2.7/site-packages/libact/labelers/ideal_labeler.py", line 33, in label for x in self.X])[0][0]] IndexError: index 0 is out of bounds for axis 0 with size 0
which numpy version are you using?
In this issue #37, it seems that np.where has different behavior before version 1.11.0. Maybe try to update numpy first?
If you are able to modify the source code, it would be helpful to know
if there is any str
in the these variableX
, y
, weight
of
https://github.com/ntucllab/libact/blob/master/libact/query_strategies/hintsvm.py#L151
I just caught my bug regarding the buffer dtype mismatch. There was a small corner case in my feature connection that was leaving a string in place.
Additionally, while writing this, I've been playing along with hintsvm.py, and there were two places where I made some modifications:
I removed the tolist()
method from the lines looping through labeled/unlabeled_pool:
Original:
x.tolist() for x in labeled_pool
x.tolist() for x in unlabeled_pool
Error:
Traceback (most recent call last): File "active-learning.py", line 148, in <module> query_id = hinted_svm_qs.make_query() File "/Users/ytk326/anaconda2/lib/python2.7/site-packages/libact/query_strategies/hintsvm.py", line 149, in make_query X = [x.tolist() for x in labeled_pool] +\ AttributeError: 'list' object has no attribute 'tolist'
Changed to:
x for x in labeled_pool
x for x in unlabeled_pool
Any thoughts regarding this?
I think previously I assumes labeled_pool
and and unlabeled_pool
being numpy array.
Maybe I should add np.asarray to make sure these two variables are indeed numpy array.
So you suggest storing the dataset as a numpy array prior to passing it to the query strategy function? (Rather than removing the tolist()
function calls?
@adithram I think you can work it around like this for now and I'll discuss with @skgg and see where to make the change to the Dataset
object.
HI @adithram ,
Currently, HintSVM will work only when inputs are float64 numpy array due to the Cython implementation. For now you can transform the lists to numpy arrays to make it work. We will solve this problem soon. Thanks for your reporting.
I can't find the implementation of get_unlabeled_entries()
or get_labeled_entries()
so I am not positive about this, but isn't the requirement dependent on the output of those functions?
In other words, are you suggesting that I modify the output of those functions to return float64 numpy arrays? Or is simply creating a dataset using two float64 numpy arrays enough to force the desired behavior to occur?
The implementation is here https://github.com/ntucllab/libact/blob/master/libact/base/dataset.py#L159.
The current problem here seems to be that the dataset object did not guaranteed the data to be numpy array with dtype=float64.
I've started a PR to fix this #122 @skgg please check after it passes CI. @adithram please let us know if this patch solves your problem.
UPDATE: I created the datasets using two float64 numpy arrays, but still received the error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-14-9b39a471ec5b> in <module>()
21 for i in range(num_queries):
22 print i
---> 23 query_id = hinted_svm_qs.make_query()
24 lbl = ideal_labeler.label(combined_dataset.data[query_id][0])
25 combined_dataset.update(query_id, lbl)
/Users/ytk326/anaconda2/lib/python2.7/site-packages/libact/query_strategies/hintsvm.py in make_query(self)
154 np.array(y, dtype=np.float64),
155 np.array(weight, dtype=np.float64),
--> 156 np.array([x.tolist() for x in unlabeled_pool], dtype=np.float64),
157 self.svm_params)
158
AttributeError: 'list' object has no attribute 'tolist'
Additionally, some unusual behavior: After removing the tolist() method, the attribute error is still raised
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-44-9b39a471ec5b> in <module>()
21 for i in range(num_queries):
22 print i
---> 23 query_id = hinted_svm_qs.make_query()
24 lbl = ideal_labeler.label(combined_dataset.data[query_id][0])
25 combined_dataset.update(query_id, lbl)
/Users/ytk326/anaconda2/lib/python2.7/site-packages/libact/query_strategies/hintsvm.py in make_query(self)
154 np.array(y, dtype=np.float64),
155 np.array(weight, dtype=np.float64),
--> 156 np.array([x for x in unlabeled_pool], dtype=np.float64),
157 self.svm_params)
158
AttributeError: 'list' object has no attribute 'tolist'
Is it okay that my features are constructed as:
[numpy ndarray]
of [nump ndarray]
of [float64]
In other words, I have a list of feature vectors, where each feature vector is constructed of various float64 values.
Hi @adithram ,
It seems you didn't modify the code correctly. The attribute error should not happen if you removed tolist() successfully. You should edit the source code then reinstall it to make the changes happen. We will fix it as soon as possible. Thanks for your reporting.
this problem seems fixed
Actually raising this issue again. Posted as a comment on a closed issue - wasn't sure if the notification system worked the same way with comments on closed issues.
I am currently having the following issue in the context of a binary classification problem. I have a set of data that I would like to use active learning to label as either anomalous or non-anomalous based on a small set of labelled data.
Is there a specific format that we have to follow for the features that we feed into the Dataset() function? Or perhaps my understanding of a binary active learning problems is incorrect or my implementation has a significant programming flaw? Any help is appreciated.
Code: