Open bclavie opened 11 months ago
Congratulation for this amazing work @bclavie 🤩,
Thank you also for the documentation with the DataLoader.
I'll run your branch in the following days to make sure everything run smoothly and then merge and release a new version.
Thank you! Please do let me know if you run into any issues -- things are training fine right now but I'm using a pretty weird setup so there might still be some issues.
Thank you also for the documentation with the DataLoader.
To be fair there's no code there at the moment, but I'm happy to update with mock data in a bit if you think it'd be useful!
I don't have multiples GPUs (not even once) at home so I cannot mimic your environment.
I propose to add the accelerate
attribute to all the models. If set to false it will call the tokenizer.encode_batch method and otherwise it will call your encoding procedure. I did this because using position_ids
raise an error with distilbert but it work fine with sentence-transformers: neural_cherche/models/base.py
I also updated the documentation a bit in order to show how to create a dataset.
All tests pass locally with the code from your branch and my updates, feel free to copy paste the code I commented.
Also what version of transformers and accelerator are you using ?
All tests pass locally with the code from your branch and my updates, feel free to copy paste the code I commented.
Hey, did you submit the comments? I can't see the suggested code anywhere, though it might be me being holiday-tired...
Thank you for taking the time to look at this and improving it! I'm running transformers==4.36.2
and accelerate==0.25.0
I've ran some more experiments, and for full disclosure so far:
My feeling is that it might be actually be unsafe to merge as a "mature" feature at this stage, but doing so and labelling it experimental support could be useful?
(as for neural-cherche itself, I really like the lightweight-ness of the lib, but currently I'm running into some issues where my models end up stuck in some kind of "compressed similarity" land and hard negatives are always extremely close to positives in similarity, which doesn't happen with the main ColBERT-codebase -- I'm training a ColBERT from scratch and will try to diagnose once I have more time!)
Hey, did you submit the comments? I can't see the suggested code anywhere, though it might be me being holiday-tired...
Ahah missed this, sorry.
> (as for neural-cherche itself, I really like the lightweight-ness of the lib, but currently I'm running into some issues where my models end up stuck in some kind of "compressed similarity" land and hard negatives are always extremely close to positives in similarity, which doesn't happen with the main ColBERT-codebase -- I'm training a ColBERT from scratch and will try to diagnose once I have more time!)
It could come from the loss function which is quite simple? Would love to get your feedback on this if you find anything.
Overall, I think it's fine to push your work on Master if we use the flag self.accelerate
, It will be a first step through the acceleration of the lib over multiple gpus ! :)
Ahah missed this, sorry.
No worries, I've applied the changes 1:1, except for the tutorial page (added that support is partial/in-progress, so people don't get the impression it's fully supported yet!)
It could come from the loss function which is quite simple? Would love to get your feedback on this if you find anything.
I think that's probably it... I'll definitely try and figure exactly what component has the biggest impact once I've got some more time
Hey! Great work on the library. I've been playing with it and ran into a few issues with in-place operations when trying to train on multiple GPUs:
Setting device this way also really doesn't play nice with the default tokeniser export, so there's a workaround to export the files individually rather than risky JSON decoding.
I've also added a doc page to show how simple it is to parallelise training with just those few changes and some very slightly code modifications in a trading script.