Closed fabiancpl closed 1 month ago
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi guys,
I followed this guide to pre-train a GPT-2 model using Accelerate with Megatron as backend. The current version of Megatron is core_r0.7.0, but I decided to use the same used in the guide (core_r0.5.0) to avoid any compatibility problem. As recommended in the guide, I use this script to get the full implementation.
For a reason that I don't understand, Megatron requires to pass the vocabulary files (vocab_file.json, merge_file.txt) and the only way that I found to do this was directly modifying the
acceleratoy.py
module by fixing the file paths previous to callmegatron_lm_initialize
. Something like this:Does this scenario make sense for you? What could be a smart way to do that from the main script?
Thanks.
Version of relevant libraries: accelerate==0.33.0 datasets==2.20.0 megatron_core==0.5.0 transformer_engine==1.8.0+3ec998e transformers==4.43.2 flash-attn==2.6.3 torch==2.4.0
Expected behavior
Passing
vocab_file
andmerge_file
as arguments of the main script, or taking directly from the pretrained tokenizer.