Closed helenypzhang closed 1 year ago
Hi Yupei,
For simplicity, you can change downloaded/roberta-base
to roberta-base
. Then it will download the model from huggingface hub automatically. For me, I just downloaded the model manually and put it in the downloaded
folder.
Best, Zhihong
Hi Zhihong, I have tried the first methods, errors still there.
For manually download, do you mean download the tokenizer.json into the downloaded folder? like this:
hi, zhihong, I download it correctly and now the code works. I think the compute node does not support downloading. I will download all the required pre-trained models manually. Thanks again. Yupei
Hi zhihong,
I have received the pretraining dataset from the authors. Thanks for your help again.
When I run the pretrain_m3ae.sh, the program always stuck because of the transformer downloading.
It is similar to one of the issues so I have tried to change tokenizer=downloaded/roberta-base to tokenizer=roberta-base, but unfortunately, the error is still there.
I think the main problem is the RobertaTokenizerFast.from_pretrained("roberta-base"), is there any other method to instead of? When I run the RobertaTokenizerFast.from_pretrained("roberta-base") on the Hpc (sever), connection error occurs even the gpu==1. But when test RobertaTokenizerFast.from_pretrained("roberta-base") on my pc, it works.
The detailed info is as follows:
WARNING - METER - No observers have been added to this run INFO - METER - Running command 'main' INFO - METER - Started Global seed set to 0 ERROR - METER - Failed after 0:00:20! Traceback (most recent calls WITHOUT Sacred internals): File "main.py", line 21, in main dm = MTDataModule(_config, dist=True) File "/home/yupei/workspaces/MICCAI/M3AE-master/m3ae/datamodules/multitask_datamodule.py", line 19, in init self.dm_dicts = {key: _datamoduleskey for key in datamodule_keys} File "/home/yupei/workspaces/MICCAI/M3AE-master/m3ae/datamodules/multitask_datamodule.py", line 19, in
self.dm_dicts = {key: _datamoduleskey for key in datamodule_keys}
File "/home/yupei/workspaces/MICCAI/M3AE-master/m3ae/datamodules/pretraining_medicat_datamodule.py", line 7, in init
super().init(*args, **kwargs)
File "/home/yupei/workspaces/MICCAI/M3AE-master/m3ae/datamodules/base_datamodule.py", line 55, in init
self.tokenizer = get_pretrained_tokenizer(tokenizer)
File "/home/yupei/workspaces/MICCAI/M3AE-master/m3ae/datamodules/base_datamodule.py", line 22, in get_pretrained_tokenizer
return RobertaTokenizerFast.from_pretrained(from_pretrained)
File "/home/yupei/miniconda3/envs/m3ae37/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1680, in from_pretrained
user_agent=user_agent,
File "/home/yupei/miniconda3/envs/m3ae37/lib/python3.7/site-packages/transformers/file_utils.py", line 1279, in cached_path
local_files_only=local_files_only,
File "/home/yupei/miniconda3/envs/m3ae37/lib/python3.7/site-packages/transformers/file_utils.py", line 1495, in get_from_cache
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.
I am sorry to bother you again, but do you have any suggestions?
Best regards, Yupei