Open KennyNg-19 opened 1 year ago
Hello, @KennyNg-19 The embeddings are stored inside the estimator and in theory can be accessed. However, reusing them for other use cases might not be easily achievable. Could you elaborate how exactly would you like to reuse the embeddings?
Hi, @iryna-kondr As the whole dataset gets embeddings before fewshotclassifier runs, so their embeddings cached locally may be used in other downstream tasks after the classfication task, like semantic search or similarity comparison.
If we cannot use the embeddings generated here, the embedding functions(especially paid API service) will be called again, which increases cost.
I am having the same problem. I dont want to recreate the embeddings at every request. I wanna do it once and reuse (both embeddings + fitted classifier) it for future calls in my system.
One additional point to consider: if we rerun experiments at a later date it would be nice to simply point to preexisting embeddings instead of re-embedding them. So same exact task, same exact data.
@iryna-kondr is this something you might consider implementing?
Hi, @AndreasKarasenko. You can pickle the estimator (with embeddings) and then load it at a later date. See our discussion here: https://discord.com/channels/1112768381406425138/1125476385750782012/1125478710427009044
Thanks for the info! Based off of that I figured out a way to get the data and embedding lists so I can store them locally. I think this issue can be closed now?
I wonder:
DynamicFewShotGPTClassifier
will cache embeddings by OpenAI locally for the 1st time calling it.