Open Namigeon opened 1 year ago
Hi @Namigeon, Thank you for the interest. For deberta v3, we use 8 A100 gpus (40G memory) to run it. The v3 large model is smaller than v2 xlarge so it should run without problem - if you encounter OOM issues, just try reducing the batch size. We don't find the results to vary a lot w.r.t the batch size. Your new link looks reasonable and I don't think there's any problem with that. You can look at it manually to see if it makes sense. You should look at your loss curve, training data, etc. to see the reason of the poor accuracy.
Okay, thanks a lot for your fast reply, I gives me a better idea of how it works !
Hello !
First of all, thanks for sharing your work.
I tried to run the training with the line giving the same results as in the paper (the last one in "task_train.sh", but I keep having a runtime error telling me that I don't have enough memory. So my question is, which GPU config did you use exactly to run the training with those hyper-parameters using this specific line ?
Also, the second line (training with DeBERTa V2 xlarge) give me very poor accuracy (around 20%), is that normal ? I ran all the preprocessing scripts and did have to change the Wiktionary link in the script "download_data.sh" to download it, so maybe there are too many differences in the last version and the data are no longer working properly with the model ? As an indication, the new link I used is https://kaikki.org/dictionary/English/all-non-inflected-senses/kaikki_dot_org-dictionary-English-all-non-infl-PIoLCx8T.json
Maybe something else needs an update ?
Looking forward for your reply !