ntucllab / libact

Pool-based active learning in Python
http://libact.readthedocs.org/
BSD 2-Clause "Simplified" License
777 stars 175 forks source link

ValueError: setting array element with sequence #171

Closed jessicamegane closed 1 year ago

jessicamegane commented 4 years ago

Hello, It's my first time using Active Learning, so probably it's a noob error, but I have a ValueError, as mentioned in the title, when calling model.predict_proba(trn_ds). It's on the validation.py file from sklearn. File "/home/user/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 448, in check_array array = array.astype(np.float64) ValueError: setting an array element with a sequence. Maybe I'm using it when I'm not suposed to, or sendind the wrong splitted dataset.. I get the same error with others datasets, I've also tried python2.7 and used this example: https://libact.readthedocs.io/en/latest/examples/plot.html , I just added the line in the for cycle model.predict_real(trn_ds) after trainning the model.

PS: I've also noticed that when I define the query strategy and when calling make_query() it trains the dataset, is it supposed to?

Thanks in advance!

yangarbiter commented 4 years ago

Can you provide more detail on the code and the error message? It seems to me that when you declare the dataset object, you passed in something that can not be turned into numpy array.

For some query strategies, it is normal to train a model when making query.

On Tue, Nov 12, 2019 at 10:55 AM Jessica Cunha notifications@github.com wrote:

Hello, It's my first time using Active Learning, so probably it's a noob error, but I have a ValueError, as mentioned in the title, when calling model.predict_proba(trn_ds). It's on the validation.py file from sklearn. File "/home/user/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 448, in check_array array = array.astype(np.float64) ValueError: setting an array element with a sequence. Maybe I'm using it when I'm not suposed to, or sendind the wrong splitted dataset.. I get the same error with others datasets, I've also tried python2.7 and used this example: https://libact.readthedocs.io/en/latest/examples/plot.html , I just added the line in the for cycle model.predict_real(trn_ds) after trainning the model.

PS: I've also noticed that when I define the query strategy and when calling make_query() it trains the dataset, is it supposed to?

Thanks in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ntucllab/libact/issues/171?email_source=notifications&email_token=AA77TVKF6R2CXTE76RSUYXDQTL33DA5CNFSM4JMHS4B2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HYZDDFQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA77TVN5AKE7K5J2B5O7ZK3QTL33DANCNFSM4JMHS4BQ .

jessicamegane commented 4 years ago

The dataset is like this: 1 1:48 2:30.46 3:59 4:177.39 5:5.62 1 1:48 2:30.46 3:58 4:176.78 5:3.37 1 1:48 2:30.46 3:57 4:158.75 5:3.37 1 1:48 2:30.46 3:60 4:137.71 5:3.37

The code

def split_train_test():
    X, y = import_libsvm_sparse(DATASET_FILEPATH).format_sklearn()
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TEST_SIZE)
    trn_ds = Dataset(X_train, numpy.concatenate([y_train[:N_LABELED], [None] * (len(y_train) - N_LABELED)]))
    tst_ds = Dataset(X_test, y_test)
    fully_labeled_trn_ds = Dataset(X_train, y_train)
    return trn_ds, tst_ds, y_train, fully_labeled_trn_ds

if __name__ == "__main__":
    trn_ds, tst_ds, y_train, fully_labeled_trn_ds = split_train_test()
    lbr = IdealLabeler(fully_labeled_trn_ds)
    quota = len(y_train) - N_LABELED
    qs = UncertaintySampling(trn_ds, method='lc', model=LogisticRegression())
    model = LogisticRegression()

    for _ in range(quota):
        ask_id = qs.make_query()  
        X, _ = zip(*trn_ds.data)
        lb = lbr.label(X[ask_id]) 
        trn_ds.update(ask_id, lb)  
        model.train(trn_ds)  
        model.predict_real(trn_ds)

The output:

/home/jessicamegane/.local/lib/python3.7/site-packages/sklearn/feature_extraction/text.py:17: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Mapping, defaultdict
Traceback (most recent call last):
  File "main.py", line 98, in <module>
    run(trn_ds, lbr, model, qs, quota)
  File "main.py", line 61, in run
    model.predict_proba(trn_ds)
  File "/home/jessicamegane/.local/lib/python3.7/site-packages/libact/models/logistic_regression.py", line 40, in predict_proba
    return self.model.predict_proba(feature, *args, **kwargs)
  File "/home/jessicamegane/.local/lib/python3.7/site-packages/sklearn/linear_model/logistic.py", line 1340, in predict_proba
    return super(LogisticRegression, self)._predict_proba_lr(X)
  File "/home/jessicamegane/.local/lib/python3.7/site-packages/sklearn/linear_model/base.py", line 338, in _predict_proba_lr
    prob = self.decision_function(X)
  File "/home/jessicamegane/.local/lib/python3.7/site-packages/sklearn/linear_model/base.py", line 300, in decision_function
    X = check_array(X, accept_sparse='csr')
  File "/home/jessicamegane/.local/lib/python3.7/site-packages/sklearn/utils/validation.py", line 448, in check_array
    array = array.astype(np.float64)
ValueError: setting an array element with a sequence.

Thank you!

yangarbiter commented 4 years ago

I think for model.predict_real(trn_ds), you should not pass in the Dataset object. You should pass in an array, for example model.predict_real(trn_ds.get_entries()[0])

jessicamegane commented 4 years ago

Yes, it worked, I didn't knew that it would predict for all querys. I thought it would give us just the prediction to the query that we got. I have another question, how do I know which query is associated to which array in the array of probabilities (Because I think it's ordered by best probability values)?

yangarbiter commented 4 years ago

Do you mean that you want to get the predicted probability of the queried example? I think you can try this, this will return an array with one value

model.predict_real([X[ask_id]])