In order to reduce the memory usage, greedy coreset is computed in batches. The number and size of batches is currently based on the wrong set.
Nevertheless, the method has failed silently but gracefully so far, resulting in batches of a different size than expected, unless when number of unlabeled indices is less than the batch size, where it results in an error similar to the following:
<...>
File "/path/to/site-packages/small_text/query_strategies/coresets.py", line 131, in sample
return greedy_coreset(embeddings, indices_unlabeled, indices_labeled, n,
File "/path/to/site-packages/small_text/query_strategies/coresets.py", line 79, in greedy_coreset
dist = dist_func(batch, x[indices_s], normalized=normalized)
File "/path/to/site-packages/small_text/query_strategies/coresets.py", line 25, in _euclidean_distance
return pairwise_distances(a, b, metric='euclidean')
File "/path/to/site-packages/sklearn/metrics/pairwise.py", line 2195, in pairwise_distances
return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
File "/path/to/site-packages/sklearn/metrics/pairwise.py", line 1765, in _parallel_pairwise
return func(X, Y, **kwds)
File "/path/to/site-packages/sklearn/metrics/pairwise.py", line 310, in euclidean_distances
X, Y = check_pairwise_arrays(X, Y)
File "/path/to/site-packages/sklearn/metrics/pairwise.py", line 165, in check_pairwise_arrays
X = check_array(
File "/path/to/site-packages/sklearn/utils/validation.py", line 969, in check_array
raise ValueError(
ValueError: Found array with 0 sample(s) (shape=(0, 768)) while a minimum of 1 is required by check_pairwise_arrays.
Bug description
In order to reduce the memory usage, greedy coreset is computed in batches. The number and size of batches is currently based on the wrong set.
Nevertheless, the method has failed silently but gracefully so far, resulting in batches of a different size than expected, unless when number of unlabeled indices is less than the batch size, where it results in an error similar to the following:
Steps to reproduce
--
Environment:
small-text version: 1.3.x, 2.0.0-dev
Addition information
--