jingtaozhan / DRhard

SIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.
BSD 3-Clause "New" or "Revised" License
125 stars 14 forks source link

An error occured when I loaded the trained model:AttributeError: module 'transformers' has no attribute 'TFRobertaDot' #1

Closed wangjiajia5889758 closed 3 years ago

jingtaozhan commented 3 years ago

Could you please provide more details, e.g., the file you are using and the commands?

wangjiajia5889758 commented 3 years ago

I have sovled the problem. In addition,I don't understand how the passage_embedding is generated?

jingtaozhan commented 3 years ago

Please see the inference.py. Function evaluation computes and saves the passage embeddings.

behindhu commented 3 years ago

I have sovled the problem. In addition,I don't understand how the passage_embedding is generated?

Hello, I have the same problem, how did you solve it?

jingtaozhan commented 3 years ago

@behindhu Any details? TFRobertaDot seems to appear nowhere in the codes. How did this error come about?

behindhu commented 3 years ago

@behindhu Any details? TFRobertaDot seems to appear nowhere in the codes. How did this error come about? Firstly, an error occurred : RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory when I run ./star/inference.py. At the same time, there was an exception: OSError: Unable to load weights from pytorch checkpoint file. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. Then, I downloaded the tensorflow package and set from_tf=True, there was an error: AttributeError: module 'transformers' has no attribute 'TFRobertaDot'

jingtaozhan commented 3 years ago

@behindhu It seems you did not unzip the file? ./data/passage/trained_models/star should be a directory that contains several files including pytorch_model.bin and config.json.

jingtaozhan commented 3 years ago

Please see the inference.py. Function evaluation computes and saves the passage embeddings.

Oh, I made a mistake. star/inference.py computes the passage embeddings. adore/inference.py utilizes the existing passage embeddings generated by star/inference.py.

behindhu commented 3 years ago

image I have unzipped the file, but the code still reports the error. image image

jingtaozhan commented 3 years ago

I found the problem. pytorch_model.bin is a corrupted file. I have re-uploaded the correct file. It should work fine now. Sorry for the inconvenience and thanks for the timely feedback!

wangjiajia5889758 commented 3 years ago

yes,I am running correctly because I changed the .bin file.

wangjiajia5889758 commented 3 years ago

embeddings = model( input_ids=batch["input_ids"], attention_mask=batch["attention_mask"], is_query=True) the query_embedding and the doc_embedding are geted by the function? what's the meaning of the parameter "is_query=True"?

jingtaozhan commented 3 years ago

embeddings = model( input_ids=batch["input_ids"], attention_mask=batch["attention_mask"], is_query=True) the query_embedding and the doc_embedding are geted by the function? what's the meaning of the parameter "is_query=True"?

whether the input is a query.

wangjiajia5889758 commented 3 years ago

if the input is a passage, the parameter "is_query=False"?

jingtaozhan commented 3 years ago

Yes, though it does not make any difference in our RobertaDot implementation (see model.py)

wangjiajia5889758 commented 3 years ago

ok, thank you!

wangjiajia5889758 commented 3 years ago

tokenizer = AutoTokenizer.from_pretrained('roberta-base', do_lower_case = True, cache_dir=None) if I will use the adore(star) model, is the tokenizer correct?

wangjiajia5889758 commented 3 years ago

tokenizer = AutoTokenizer.from_pretrained('/home/tu/device/backup/wjjia/DRhard/adore-star', do_lower_case = True, cache_dir=None) error: OSError: Can't load tokenizer for '/home/tu/device/backup/wjjia/DRhard/adore-star'. Make sure that:

But, when I load the star trained model: tokenizer = AutoTokenizer.from_pretrained('/home/tu/device/backup/wjjia/DRhard/star-model', do_lower_case = True, cache_dir=None)

it works well

jingtaozhan commented 3 years ago

tokenizer = AutoTokenizer.from_pretrained('roberta-base', do_lower_case = True, cache_dir=None) if I will use the adore(star) model, is the tokenizer correct?

We did not change the tokenizer and use the default Roberta tokenizer. Therefore, the above code is correct. You do not need to load the tokenizer from our provided models.

We preprocessed the dataset using transformers==2.8.0, and we recommend you to also use this version for preprocessing.

Close this issue. Open a new one if you have other questions.

wangjiajia5889758 commented 3 years ago

Ok! Thank you.