hey I am trying to run this repo on a big linux computer that has 8 h100 and is just sitting idol so timely help would be very appreciated.
as a test I am ruining things on my ubuntu mostly to see it runs but I got this weird error that seems like it is potentially a bug since it is an uncaught exception with no explanation
so this is probably me not setting up the environment correctly I tried pip installing it and it didn't work so I went and used anaconda for some and pip for the rest.
tokenizer worked fine then I ran the train and got:
(neox_toolkit) user@user-System-Product-Name:~/Desktop/coder reaserch/gpt-neox$ python train.py tester_config.yml tester_setup.yml
Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "train.py", line 20, in <module>
from megatron.training import pretrain
File "/home/user/Desktop/coder reaserch/gpt-neox/megatron/training.py", line 58, in <module>
from eval_tasks import run_eval_harness
File "/home/user/Desktop/coder reaserch/gpt-neox/eval_tasks/__init__.py", line 15, in <module>
from .eval_adapter import EvalHarnessAdapter, run_eval_harness
File "/home/user/Desktop/coder reaserch/gpt-neox/eval_tasks/eval_adapter.py", line 16, in <module>
import best_download
File "/home/user/anaconda3/envs/neox_toolkit/lib/python3.8/site-packages/best_download/__init__.py", line 35, in <module>
retry_strategy = Retry(
TypeError: __init__() got an unexpected keyword argument 'method_whitelist'
(neox_toolkit) user@user-System-Product-Name:~/Desktop/coder reaserch/gpt-neox$
note that I did went ahead and tried literally all the available best_download version and got the same error, tokenizer itself was from hugging face polycoder and I pased it like this
environment was made haphazardly and its just a mockup to see I can run things. if this is a known environment bug I would love for an explanation of whats wrong so when I set it up on the real machine it would stick
hey I am trying to run this repo on a big linux computer that has 8 h100 and is just sitting idol so timely help would be very appreciated. as a test I am ruining things on my ubuntu mostly to see it runs but I got this weird error that seems like it is potentially a bug since it is an uncaught exception with no explanation
so this is probably me not setting up the environment correctly I tried pip installing it and it didn't work so I went and used anaconda for some and pip for the rest.
tokenizer worked fine then I ran the train and got:
note that I did went ahead and tried literally all the available best_download version and got the same error, tokenizer itself was from hugging face polycoder and I pased it like this
environment was made haphazardly and its just a mockup to see I can run things. if this is a known environment bug I would love for an explanation of whats wrong so when I set it up on the real machine it would stick