Open AndreasKarasenko opened 3 months ago
Hi @AndreasKarasenko,
Yes, the order of the samples introduces some bias. For the regular FewShot this can be easily solved by permuting the training data. It is not that straight-forward in the DynamicFewShot and would require some refactoring.
On the other hand, I am not sure whether it poses such a big problem. The study you provided is from 2021 and hence relatively outdated.
Also, from my personal observations, sometimes even in the ZeroShot setting, the order of the candidate labels is relevant. Therefore, the bias would probably always introduce some bias which can hardly be completely avoided.
Yes, I agree that it is a good idea to at least mention it somewhere and in the future think about refactoring the code a bit to minimize this bias.
I will keep the issue open for now.
Context According to this paper ChatGPT (and likely other LLMs) suffer from a recency bias. Whatever class comes last has a higher propability of being selected. Issue Currently scikit-llm constructs prompts based on the order of the training data. Since we are recommended to restrict the training data I would usually do something like this:
Which returns a sorted dataframe by label_col. Even if
sort=False
is passed togroupby
the instances are still clustered by label.Question/Solution Should a method be implemented that randomizes the order of samples in the prompt / training data, or should users take care of that themselves? The most straightforward way would be to simply add this to sampling:
Which leaves it up to chance to balance it reasonably.