Closed emilysilcock closed 4 months ago
The EGL strategy is tailored to the KimCNN and has also been scientifically proven to be effective only for this. You are using a transformer model (juding from SequenceClassifierOutput
). (I improve the documentation and add this restriction.)
This could be adapted to work for transformers. I have investigated this myself, although I was a little less experienced at that time, querying based on gradients / gradient length seemed not to be effective at all for transformer models. Since it didn't really work, I gave up on this idea.
If I had to try that again, I would try the EGL variant that operates on the gradient of the last layer. If you are interested, I also have some old code for this, but if you are looking for well-working query strategy, I would advise against this.
Interesting! That's good to know. This paper has some success with some implementation of EGL on text, but I'm not sure how that differs - gradient-based methods are really not my speciality, just comparing as a benchmark
That's right, this may be one the few exceptions. I may have expressed this imprecisely since this is a counterexample to my statement and indeed uses it. It seems slightly superior than random sampling, so technically it it working, but would you use it judging from that paper? It is considerably more expensive, but does not really improve upon the simpler strategies. In general, the original paper seems where EGL is the most successful.
My own negative results on EGL might also be owed to my benchmark datasets, which were mostly balanced. This is exactly the setting where Ein-Dor et al. report results that are not statistically significant for EGL.
I know this paper and I remember thinking about this as well. It seems strange that they do not provide an implementation for EGL, although they published their code for the other strategies. Still, if you are looking at similar papers from that time (e.g., (Yuan et al., 2020), (Margatina et al., 2021), or (Zhang et al., 2021), EGL is not a common point of comparison anymore. Not saying this is a good thing, in a perfect world we wouldn't be that compute limited as it is currently the case, and evaluations could include many more combinations.
Thanks for this - this was really helpful!
Bug description
I'm trying to run some experiment with Expected Gradient Length. I've used the code in your sample notebook exactly, apart from I've swapped
query_strategy = PredictionEntropy()
forquery_strategy = ExpectedGradientLength(num_classes)
When calling
active_learner.query
this gives the errorAttributeError: 'SequenceClassifierOutput' object has no attribute 'softmax'
The full traceback is below.
Environment:
Python version: I'm running on colab so currently 3.10, but I've replicated with 3.7 as well small-text version: 1.3.3 small-text integrations (e.g., transformers): Transformers PyTorch version (if applicable): 2.2.1+cu121
Installation (pip, conda, or from source): pip install small-text[transformers]==1.3.3
Addition information
The full traceback is: