webis-de / small-text

Active Learning for Text Classification in Python
https://small-text.readthedocs.io/
MIT License
547 stars 60 forks source link

EGL throws error #58

Closed emilysilcock closed 4 months ago

emilysilcock commented 6 months ago

Bug description

I'm trying to run some experiment with Expected Gradient Length. I've used the code in your sample notebook exactly, apart from I've swapped

query_strategy = PredictionEntropy() for query_strategy = ExpectedGradientLength(num_classes)

When calling active_learner.query this gives the error AttributeError: 'SequenceClassifierOutput' object has no attribute 'softmax'

The full traceback is below.

Environment:

Python version: I'm running on colab so currently 3.10, but I've replicated with 3.7 as well small-text version: 1.3.3 small-text integrations (e.g., transformers): Transformers PyTorch version (if applicable): 2.2.1+cu121

Installation (pip, conda, or from source): pip install small-text[transformers]==1.3.3

Addition information

The full traceback is:

AttributeError                            Traceback (most recent call last)
[<ipython-input-18-680c16aeec82>](https://localhost:8080/#) in <cell line: 23>()
     23 for i in range(num_queries):
     24     # ...where each iteration consists of labelling 20 samples
---> 25     indices_queried = active_learner.query(num_samples=20)
     26 
     27     # Simulate user interaction here. Replace this for real-world usage.

5 frames
[/usr/local/lib/python3.10/dist-packages/small_text/active_learner.py](https://localhost:8080/#) in query(self, num_samples, representation, query_strategy_kwargs)
    193 
    194         representation = self.dataset if representation is None else representation
--> 195         self.indices_queried = self.query_strategy.query(self._clf,
    196                                                          representation,
    197                                                          indices[self.mask],

[/usr/local/lib/python3.10/dist-packages/small_text/query_strategies/base.py](https://localhost:8080/#) in query(self, clf, datasets, indices_unlabeled, indices_labeled, y, n, *args, **kwargs)
     48                                        f'but single-label data was encountered')
     49 
---> 50             return super().query(clf, datasets, indices_unlabeled, indices_labeled, y,
     51                                  *args, n=n, **kwargs)
     52 

[/usr/local/lib/python3.10/dist-packages/small_text/integrations/pytorch/query_strategies/strategies.py](https://localhost:8080/#) in query(self, clf, dataset, indices_unlabeled, indices_labeled, y, n, pbar)
     64         with pbar_context as pbar:
     65             for i, (dataset, *_) in enumerate(dataset_iter):
---> 66                 self.compute_gradient_lengths(clf, criterion, gradient_lengths, offset, dataset)
     67 
     68                 batch_len = dataset.size(0)

[/usr/local/lib/python3.10/dist-packages/small_text/integrations/pytorch/query_strategies/strategies.py](https://localhost:8080/#) in compute_gradient_lengths(self, clf, criterion, gradient_lengths, offset, x)
     95         clf.model.zero_grad()
     96 
---> 97         self.compute_gradient_lengths_batch(clf, criterion, x, gradients, all_classes)
     98         self.aggregate_gradient_lengths_batch(batch_len, gradient_lengths, gradients, offset)
     99 

[/usr/local/lib/python3.10/dist-packages/small_text/integrations/pytorch/query_strategies/strategies.py](https://localhost:8080/#) in compute_gradient_lengths_batch(self, clf, criterion, x, gradients, all_classes)
    107         output = clf.model(x)
    108         with torch.no_grad():
--> 109             sm = F.softmax(output, dim=1)
    110 
    111         for j in range(self.num_classes):

[/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py](https://localhost:8080/#) in softmax(input, dim, _stacklevel, dtype)
   1856         dim = _get_softmax_dim("softmax", input.dim(), _stacklevel)
   1857     if dtype is None:
-> 1858         ret = input.softmax(dim)
   1859     else:
   1860         ret = input.softmax(dim, dtype=dtype)

AttributeError: 'SequenceClassifierOutput' object has no attribute 'softmax'
chschroeder commented 6 months ago

The EGL strategy is tailored to the KimCNN and has also been scientifically proven to be effective only for this. You are using a transformer model (juding from SequenceClassifierOutput). (I improve the documentation and add this restriction.)

This could be adapted to work for transformers. I have investigated this myself, although I was a little less experienced at that time, querying based on gradients / gradient length seemed not to be effective at all for transformer models. Since it didn't really work, I gave up on this idea.

If I had to try that again, I would try the EGL variant that operates on the gradient of the last layer. If you are interested, I also have some old code for this, but if you are looking for well-working query strategy, I would advise against this.

emilysilcock commented 6 months ago

Interesting! That's good to know. This paper has some success with some implementation of EGL on text, but I'm not sure how that differs - gradient-based methods are really not my speciality, just comparing as a benchmark

chschroeder commented 6 months ago

That's right, this may be one the few exceptions. I may have expressed this imprecisely since this is a counterexample to my statement and indeed uses it. It seems slightly superior than random sampling, so technically it it working, but would you use it judging from that paper? It is considerably more expensive, but does not really improve upon the simpler strategies. In general, the original paper seems where EGL is the most successful.

My own negative results on EGL might also be owed to my benchmark datasets, which were mostly balanced. This is exactly the setting where Ein-Dor et al. report results that are not statistically significant for EGL.

I know this paper and I remember thinking about this as well. It seems strange that they do not provide an implementation for EGL, although they published their code for the other strategies. Still, if you are looking at similar papers from that time (e.g., (Yuan et al., 2020), (Margatina et al., 2021), or (Zhang et al., 2021), EGL is not a common point of comparison anymore. Not saying this is a good thing, in a perfect world we wouldn't be that compute limited as it is currently the case, and evaluations could include many more combinations.

emilysilcock commented 4 months ago

Thanks for this - this was really helpful!