microsoft / KEAR

Official code for achieving human parity on CommonsenseQA with External Attention
106 stars 25 forks source link

Bad results with DeBERTa V2 and runtime error (CUDA out of memory) with DeBERTa V3 #13

Open Namigeon opened 1 year ago

Namigeon commented 1 year ago

Hello !

First of all, thanks for sharing your work.

I tried to run the training with the line giving the same results as in the paper (the last one in "task_train.sh", but I keep having a runtime error telling me that I don't have enough memory. So my question is, which GPU config did you use exactly to run the training with those hyper-parameters using this specific line ?

Also, the second line (training with DeBERTa V2 xlarge) give me very poor accuracy (around 20%), is that normal ? I ran all the preprocessing scripts and did have to change the Wiktionary link in the script "download_data.sh" to download it, so maybe there are too many differences in the last version and the data are no longer working properly with the model ? As an indication, the new link I used is https://kaikki.org/dictionary/English/all-non-inflected-senses/kaikki_dot_org-dictionary-English-all-non-infl-PIoLCx8T.json

Maybe something else needs an update ?

Looking forward for your reply !

xycforgithub commented 1 year ago

Hi @Namigeon, Thank you for the interest. For deberta v3, we use 8 A100 gpus (40G memory) to run it. The v3 large model is smaller than v2 xlarge so it should run without problem - if you encounter OOM issues, just try reducing the batch size. We don't find the results to vary a lot w.r.t the batch size. Your new link looks reasonable and I don't think there's any problem with that. You can look at it manually to see if it makes sense. You should look at your loss curve, training data, etc. to see the reason of the poor accuracy.

Namigeon commented 1 year ago

Okay, thanks a lot for your fast reply, I gives me a better idea of how it works !