Closed GorkaUrbizu closed 2 years ago
If you see something like "Out of memory" in your cuda error, then it's memory related, in which case you should use smaller batches with more gradient accumulation steps.
Unless I see your error log I cant be sure why this happens but its not a bug in the code.
As for the --hard_truncate_length argument, it is used to make sure that the maximum length does not exceed 1024 subwords.
--max_length=128 acts as a truncation for the raw sentence and counts number of words (not sub-words). However, it is possible that there may be a sentence with a random but long string of 1000 characters. So the sentence length is 1 but if you do subword segmentation then it may go beyond 1024 tokens which the model does not handle by default. This is where --hard_truncate_length comes into play. The maximum value it should take is the same as "max_position_embeddings" in the mBART config which is 1024 by default. I usually set it to 256 to account for long sentences and 1024 for documents. If you wish to go beyond this, then you also have to set max_position_embeddings.
My batch construction algo is like this:
There is a possibility that a single problematic sentence which, even after subword segmentation, goes above batch size. In this case the code will run in an infinite loop without returning any batches. However this is probably never going to happen. I plan to push a bunch of new updates in the coming few days which takes care of this unlikely issue too. Overall, I would not worry.
Thanks for the details! it was very helpful!
My error doesn't seem related with --hard_truncate_length
, but I will set --hard_truncate_length
at 1024 just in case from now on.
I will ignore the error for now, and if I get it again will come back with the full log asking for your help.
PD: It wasn't an OOM error, I know those already too well hahahaa
Hi again,
After getting the NAN loss error from the previews issue, I launched another training during the weekend:
With which I got the following error after 11K steps:
I don't know what caused this, so I will run next trainings with
CUDA_LAUNCH_BLOCKING=1
activated.But also I want to use
--hard_truncate_length argument
in case the problem is caused by the length of sequences.But I'm not sure if I understand well what
--hard_truncate_length argument
exactly does. let's say that I want to train a model with--max_length=128
and--batch_size=4096
... if I understood correctly, I should set--hard_truncate_length
at 4096 too, right?Thanks for your time. Regards, Gorka