Closed XNarno closed 2 weeks ago
Hi @XNarno,
Thank you for reporting this! This is an error I haven't seen before. I suspect this is an issue that numpy has with Windows 10.
I will further investigate this. If you want to try the notebook, you could execute it in Google Colab for now.
This is (partly) a peculiarity of Windows 10 and numpy.
Either way, it will be fixed in small-text 1.4.1.
I have a fix. Unfortunately, I don't have a system where I could test this. Whenever you have a moment, could you please let me know if the fix is working?
You can install from the v1.4.x
branch directly with:
pip install git+https://github.com/webis-de/small-text.git@v1.4.x
Bug description
The tutorial is not running in the "initialize_active_learner" part related to "Setting up the Active Learner" session.
Steps to reproduce
Running the third notebook https://github.com/webis-de/small-text/blob/main/examples/notebooks/03-active-learning-with-setfit.ipynb
Expected behavior
Not getting this error.
Environment:
Python version: 3.12.4 small-text version: 1.4.0 small-text integrations (e.g., transformers): 4.42.3 PyTorch version (if applicable): / OS : Windows 10 Enterprise
Installation (pip, conda, or from source): pip in a conda env CUDA version (if applicable): /
Addition information
The error message :
`ValueError Traceback (most recent call last) Cell In[26], line 27 22 active_learner.initialize_data(x_indices_initial, y_initial) 24 return x_indices_initial ---> 27 initial_indices = initialize_active_learner(active_learner, train.y) 28 labeled_indices = initial_indices
Cell In[26], line 22 19 #x_indices_initial = x_indices_initial.astype(int) 20 y_initial = y_train_int[x_indices_initial] ---> 22 active_learner.initialize_data(x_indices_initial, y_initial) 24 return x_indices_initial
File c:\Users\XXX\AppData\Local\anaconda3\envs\APIClassifier\Lib\site-packages\small_text\active_learner.py:154, in PoolBasedActiveLearner.initialize_data(self, indices_initial, y_initial, indices_ignored, indices_validation, retrain) 151 self.indices_ignored = np.empty(shape=(0), dtype=int) 153 if retrain: --> 154 self._retrain(indices_validation=indices_validation)
File c:\Users\XXX\AppData\Local\anaconda3\envs\APIClassifier\Lib\site-packages\small_text\active_learner.py:393, in PoolBasedActiveLearner._retrain(self, indices_validation) 390 dataset.y = self.y 392 if indices_validation is None: --> 393 self._clf.fit(dataset, **self.fit_kwargs) 394 else: 395 indices = np.arange(self.indices_labeled.shape[0]) ... File numpy\random\mtrand.pyx:780, in numpy.random.mtrand.RandomState.randint()
File numpy\random\_bounded_integers.pyx:1423, in numpy.random._bounded_integers._rand_int32()
ValueError: high is out of bounds for int32`