Open rayendito opened 1 year ago
Perfect idea. After we implement the autopilot dataset retriever, we can add your idea!
Actually I think that our existing dataset retriever is probably fine, I don't think this issue is blocked by anything. @rayendito , if you'd like to take a stab at it I could assign the issue to you!
sure! i'll be happy to play around with it
@neubig Sir, Is this still being done by @rayendito, would love to collaborate on this.
Hi @bilal-aamer , definitely go ahead and play around with this unless @rayendito has already finished!
hi @bilal-aamer I did a somewhat MVP implementation for this some months ago in https://github.com/rayendito/prompt2model/tree/sample-from-dataset but I haven't made any PRs yet since I figured that we need to figure out if this method actually yields better results. I've been paying attention to the discussion on the #multilingual channel on Discord and chimed in a few times. I thought maybe a core team member was already working on this and I sort of spectated because I didn't want to overstep any territory😅 but if @neubig says go ahead then go for it! I'm actually particularly interested in MT and am currently exploring maybe a more specific version of this issue (MT only)
Oh, go ahead! I think this is distinct methodologically from what @VanyaBK is looking at, so please go ahead and run whatever you want to.
And @bilal-aamer maybe you could discuss with @rayendito and see if there's anything he could use help with on the discord. I'm happy to pitch in to the discussion.
Instead of only using examples given by the users in the user prompt, what if we try to use retrieved datasets (if applicable) as high-quality shots for the dataset generator?