lrsoenksen / HAIM

This repository contains the code to replicate the data processing, modeling and reporting of our Holistic AI in Medicine (HAIM) Publication in Nature Machine Intelligence (Soenksen LR, Ma Y, Zeng C et al. 2022).
Apache License 2.0
104 stars 27 forks source link

Issues loading biobert #8

Closed carvalhoek closed 1 year ago

carvalhoek commented 1 year ago

Hey guys, I was trying to run the same experiments you performed but I've run into an error when running the file 1_1-Create Pickle Files.py: 404 Client Error: Not Found for url: https://huggingface.co/pretrained_bert_tf/biobert_pretrain_output_all_notes_150000//resolve/main/config.json Traceback (most recent call last): File "/home/saia/programfiles/anaconda3/envs/haim/lib/python3.6/site-packages/transformers/configuration_utils.py", line 520, in get_config_dict user_agent=user_agent, File "/home/saia/programfiles/anaconda3/envs/haim/lib/python3.6/site-packages/transformers/file_utils.py", line 1371, in cached_path local_files_only=local_files_only, File "/home/saia/programfiles/anaconda3/envs/haim/lib/python3.6/site-packages/transformers/file_utils.py", line 1534, in get_from_cache r.raise_for_status() File "/home/saia/programfiles/anaconda3/envs/haim/lib/python3.6/site-packages/requests/models.py", line 943, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/pretrained_bert_tf/biobert_pretrain_output_all_notes_150000//resolve/main/config.json

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "1_1-Create Pickle Files.py", line 28, in from MIMIC_IV_HAIM_API import * File "/home/saia/files/HAIM/MIMIC_IV_HAIM_API.py", line 114, in biobert_tokenizer = AutoTokenizer.from_pretrained(biobert_path) File "/home/saia/programfiles/anaconda3/envs/haim/lib/python3.6/site-packages/transformers/models/auto/tokenization_auto.py", line 534, in from_pretrained config = AutoConfig.from_pretrained(pretrained_model_name_or_path, kwargs) File "/home/saia/programfiles/anaconda3/envs/haim/lib/python3.6/site-packages/transformers/models/auto/configuration_auto.py", line 450, in from_pretrained configdict, = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) File "/home/saia/programfiles/anaconda3/envs/haim/lib/python3.6/site-packages/transformers/configuration_utils.py", line 532, in get_config_dict raise EnvironmentError(msg) OSError: Can't load config for 'pretrained_bert_tf/biobert_pretrain_output_all_notes_150000/'. Make sure that:

It seems that the script is trying to load a BioBERT model from Hugging Face's model hub, but the specified path is not found. Do you know why this migth be happening?

lrsoenksen commented 1 year ago

Hi @carvalhoek,

I believe for this code we downloaded the model locally and ran it using the Hugging Face API. Given that this was a model and feature extractor made by another party, we do not have control on its distribution and support. What you can do is to change the code in your fork to implement and analogous model (https://huggingface.co/dmis-lab/biobert-v1.1).

You could do it like this (in the relevant code section):

from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("dmis-lab/biobert-v1.1") model = AutoModel.from_pretrained("dmis-lab/biobert-v1.1")

Happy coding, --lrsoenksen

carvalhoek commented 1 year ago

Thanks man, that seems to solve it =)