modAL-python / modAL

A modular active learning framework for Python
https://modAL-python.github.io/
MIT License
2.24k stars 324 forks source link

use different query strategies #41

Closed alexv1247 closed 5 years ago

alexv1247 commented 5 years ago

I am using keras/tensorflow models with this framework and the activelearner class. As soon as I try to change the query strategy, different errors occur.

  learner = ActiveLearner(
estimator=classifier,
query_strategy=expected_error_reduction,
X_training=x_initial_training,
y_training=y_initial_training,
)
prescore = learner.score(x_test, y_test)
n_queries = 50
postscore = np.zeros(shape=(n_queries, 1))
for idx in range(n_queries):
    print('Query no. %d' % (idx + 1))
    query_idx, query_instance = learner.query(x_pool)
    learner.teach(
        X=x_pool[query_idx],
        y=y_pool[query_idx],
        only_new=True,
        epochs=10,
        validation_data=(x_val, y_val),
    )
   # remove queried instances from pool
   x_pool = np.delete(x_pool, query_idx, axis=0)
   y_pool = np.delete(y_pool, query_idx, axis=0)
   postscore[idx, 0] = learner.score(x_test, y_test)

What do I have to change to implement the different strategies. The trainings_input is 3D shape. I tried up to now all uncertainty methods of which only the default selection did work. Now I was trying the expected error_reduction strategy, but there occur errors as well.

I am afraid the 3D shape of the training data is killing all the other algorithms, but for a LSTM this kind of shape is required.

cosmic-cortex commented 5 years ago

Can you post the error messages here? Without them, I cannot tell for sure.

Does your model work without modAL? Can you train it with your data? Because I don't think the 3D shape is a problem for modAL, since the data interacts with the model only. (I have tried other 3D shapes for image classification problems, they work fine.)

alexv1247 commented 5 years ago

This is the error message fot batch_uncertainty_sampling: Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3296, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-2-c64fd087a5a2>", line 1, in <module> runfile('C:/Users/alexv/PycharmProjects/Active_Learning/active_learning_types/standard_modAL.py', wdir='C:/Users/alexv/PycharmProjects/Active_Learning/active_learning_types') File "C:\Program Files\JetBrains\PyCharm 2019.1.1\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "C:\Program Files\JetBrains\PyCharm 2019.1.1\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "C:/Users/alexv/PycharmProjects/Active_Learning/active_learning_types/standard_modAL.py", line 44, in <module> query_idx, query_instance = learner.query(x_pool, n_instances=20) File "C:\ProgramData\Anaconda3\lib\site-packages\modAL\models\base.py", line 194, in query query_result = self.query_strategy(self, *query_args, **query_kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\modAL\batch.py", line 197, in uncertainty_batch_sampling n_instances=n_instances, metric=metric, n_jobs=n_jobs) File "C:\ProgramData\Anaconda3\lib\site-packages\modAL\batch.py", line 150, in ranked_batch metric=metric, n_jobs=n_jobs) File "C:\ProgramData\Anaconda3\lib\site-packages\modAL\batch.py", line 82, in select_instance n_labeled_records, _ = X_training.shape ValueError: too many values to unpack (expected 2)

alexv1247 commented 5 years ago

This is the error for expected_error_reduction: Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3296, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-2-c64fd087a5a2>", line 1, in <module> runfile('C:/Users/alexv/PycharmProjects/Active_Learning/active_learning_types/standard_modAL.py', wdir='C:/Users/alexv/PycharmProjects/Active_Learning/active_learning_types') File "C:\Program Files\JetBrains\PyCharm 2019.1.1\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "C:\Program Files\JetBrains\PyCharm 2019.1.1\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "C:/Users/alexv/PycharmProjects/Active_Learning/active_learning_types/standard_modAL.py", line 44, in <module> query_idx, query_instance = learner.query(x_pool, n_instances=20) File "C:\ProgramData\Anaconda3\lib\site-packages\modAL\models\base.py", line 194, in query query_result = self.query_strategy(self, *query_args, **query_kwargs) File "C:\ProgramData\Anaconda3\lib\site-packages\modAL\expected_error.py", line 64, in expected_error_reduction X_new = data_vstack((learner.X_training, x.reshape(1, -1))) File "C:\ProgramData\Anaconda3\lib\site-packages\modAL\utils\data.py", line 22, in data_vstack return np.concatenate(blocks) ValueError: all the input arrays must have same number of dimensions

Since this is the same code I used for the default query strategy and the same data I dont know how to tackle this error.

cosmic-cortex commented 5 years ago

What is the type and shape of your training data? Especially x_initial_training and x_pool, the problem seems to be with those. For batch sampling, it seems to be that x_initial_training is actually a 1D array. With expected error reduction, the problem can be the same if the shape of these arrays are different. Can you check these?

alexv1247 commented 5 years ago

x_pool is a numpy array with a shape of (31982, 10, 6) and type float. x_inital_training is a numpy array with a shape of (636, 10, 6) and type float

cosmic-cortex commented 5 years ago

I'll try to figure out what went wrong soon. Not sure I can look into this during the weekend, but I'll fix this by the end of next week!

cosmic-cortex commented 5 years ago

Quick update: the bug is definitely in modAL, I am preparing a fix, it will be ready soon!

alexv1247 commented 5 years ago

thanks a lot for your effort!

Sent with GitHawk

cosmic-cortex commented 5 years ago

The fix is in! Now these query strategies work with multidimensional data. You can update your local installation by installing directly from the master branch:

pip install git+https://github.com/modAL-python/modAL.git

Let me know if there is a problem!

A small note. Expected error reduction will only work with scikit-learn models since this requires cloning and retraining the classifier, which might not work with Keras.