Closed wangjiajia5889758 closed 3 years ago
I have sovled the problem. In addition,I don't understand how the passage_embedding is generated?
Please see the inference.py. Function evaluation
computes and saves the passage embeddings.
I have sovled the problem. In addition,I don't understand how the passage_embedding is generated?
Hello, I have the same problem, how did you solve it?
@behindhu
Any details? TFRobertaDot
seems to appear nowhere in the codes. How did this error come about?
@behindhu Any details?
TFRobertaDot
seems to appear nowhere in the codes. How did this error come about? Firstly, an error occurred : RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory when I run ./star/inference.py. At the same time, there was an exception: OSError: Unable to load weights from pytorch checkpoint file. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. Then, I downloaded the tensorflow package and set from_tf=True, there was an error: AttributeError: module 'transformers' has no attribute 'TFRobertaDot'
@behindhu
It seems you did not unzip the file? ./data/passage/trained_models/star
should be a directory that contains several files including pytorch_model.bin
and config.json
.
Please see the inference.py. Function
evaluation
computes and saves the passage embeddings.
Oh, I made a mistake.
star/inference.py
computes the passage embeddings.
adore/inference.py
utilizes the existing passage embeddings generated by star/inference.py
.
I have unzipped the file, but the code still reports the error.
I found the problem. pytorch_model.bin
is a corrupted file. I have re-uploaded the correct file. It should work fine now.
Sorry for the inconvenience and thanks for the timely feedback!
yes,I am running correctly because I changed the .bin file.
embeddings = model( input_ids=batch["input_ids"], attention_mask=batch["attention_mask"], is_query=True) the query_embedding and the doc_embedding are geted by the function? what's the meaning of the parameter "is_query=True"?
embeddings = model( input_ids=batch["input_ids"], attention_mask=batch["attention_mask"], is_query=True) the query_embedding and the doc_embedding are geted by the function? what's the meaning of the parameter "is_query=True"?
whether the input is a query.
if the input is a passage, the parameter "is_query=False"?
Yes, though it does not make any difference in our RobertaDot
implementation (see model.py
)
ok, thank you!
tokenizer = AutoTokenizer.from_pretrained('roberta-base', do_lower_case = True, cache_dir=None) if I will use the adore(star) model, is the tokenizer correct?
tokenizer = AutoTokenizer.from_pretrained('/home/tu/device/backup/wjjia/DRhard/adore-star', do_lower_case = True, cache_dir=None) error: OSError: Can't load tokenizer for '/home/tu/device/backup/wjjia/DRhard/adore-star'. Make sure that:
'/home/tu/device/backup/wjjia/DRhard/adore-star' is a correct model identifier listed on 'https://huggingface.co/models'
or '/home/tu/device/backup/wjjia/DRhard/adore-star' is the correct path to a directory containing relevant tokenizer files
But, when I load the star trained model: tokenizer = AutoTokenizer.from_pretrained('/home/tu/device/backup/wjjia/DRhard/star-model', do_lower_case = True, cache_dir=None)
it works well
tokenizer = AutoTokenizer.from_pretrained('roberta-base', do_lower_case = True, cache_dir=None) if I will use the adore(star) model, is the tokenizer correct?
We did not change the tokenizer and use the default Roberta tokenizer. Therefore, the above code is correct. You do not need to load the tokenizer from our provided models.
We preprocessed the dataset using transformers==2.8.0
, and we recommend you to also use this version for preprocessing.
Close this issue. Open a new one if you have other questions.
Ok! Thank you.
Could you please provide more details, e.g., the file you are using and the commands?