zhjohnchan / M3AE

[MICCAI-2022] This is the official implementation of Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training.
115 stars 11 forks source link

ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. #7

Closed helenypzhang closed 1 year ago

helenypzhang commented 1 year ago

Hi zhihong,

I have received the pretraining dataset from the authors. Thanks for your help again.

When I run the pretrain_m3ae.sh, the program always stuck because of the transformer downloading.

It is similar to one of the issues so I have tried to change tokenizer=downloaded/roberta-base to tokenizer=roberta-base, but unfortunately, the error is still there.

I think the main problem is the RobertaTokenizerFast.from_pretrained("roberta-base"), is there any other method to instead of? When I run the RobertaTokenizerFast.from_pretrained("roberta-base") on the Hpc (sever), connection error occurs even the gpu==1. But when test RobertaTokenizerFast.from_pretrained("roberta-base") on my pc, it works.

The detailed info is as follows:

WARNING - METER - No observers have been added to this run INFO - METER - Running command 'main' INFO - METER - Started Global seed set to 0 ERROR - METER - Failed after 0:00:20! Traceback (most recent calls WITHOUT Sacred internals): File "main.py", line 21, in main dm = MTDataModule(_config, dist=True) File "/home/yupei/workspaces/MICCAI/M3AE-master/m3ae/datamodules/multitask_datamodule.py", line 19, in init self.dm_dicts = {key: _datamoduleskey for key in datamodule_keys} File "/home/yupei/workspaces/MICCAI/M3AE-master/m3ae/datamodules/multitask_datamodule.py", line 19, in self.dm_dicts = {key: _datamoduleskey for key in datamodule_keys} File "/home/yupei/workspaces/MICCAI/M3AE-master/m3ae/datamodules/pretraining_medicat_datamodule.py", line 7, in init super().init(*args, **kwargs) File "/home/yupei/workspaces/MICCAI/M3AE-master/m3ae/datamodules/base_datamodule.py", line 55, in init self.tokenizer = get_pretrained_tokenizer(tokenizer) File "/home/yupei/workspaces/MICCAI/M3AE-master/m3ae/datamodules/base_datamodule.py", line 22, in get_pretrained_tokenizer return RobertaTokenizerFast.from_pretrained(from_pretrained) File "/home/yupei/miniconda3/envs/m3ae37/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 1680, in from_pretrained user_agent=user_agent, File "/home/yupei/miniconda3/envs/m3ae37/lib/python3.7/site-packages/transformers/file_utils.py", line 1279, in cached_path local_files_only=local_files_only, File "/home/yupei/miniconda3/envs/m3ae37/lib/python3.7/site-packages/transformers/file_utils.py", line 1495, in get_from_cache "Connection error, and we cannot find the requested files in the cached path." ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

I am sorry to bother you again, but do you have any suggestions?

Best regards, Yupei

zhjohnchan commented 1 year ago

Hi Yupei,

For simplicity, you can change downloaded/roberta-base to roberta-base. Then it will download the model from huggingface hub automatically. For me, I just downloaded the model manually and put it in the downloaded folder.

Best, Zhihong

helenypzhang commented 1 year ago

Hi Zhihong, I have tried the first methods, errors still there.

For manually download, do you mean download the tokenizer.json into the downloaded folder? like this: 1669519908121

helenypzhang commented 1 year ago

hi, zhihong, I download it correctly and now the code works. I think the compute node does not support downloading. I will download all the required pre-trained models manually. Thanks again. Yupei