princeton-nlp / DensePhrases

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624
https://arxiv.org/abs/2012.12624
Apache License 2.0
605 stars 78 forks source link

phrase vector generation #9

Closed neerajkuhike closed 3 years ago

neerajkuhike commented 3 years ago

Hi

While running this code :

python generate_phrase_vecs.py --model_type bert --pretrained_name_or_path SpanBERT/spanbert-base-cased --data_dir ./ --cache_dir $CACHE_DIR --predict_file sample/articles.json --do_dump --max_seq_length 512 --doc_stride 500 --fp16 --filter_threshold -2.0 --append_title --load_dir $SAVE_DIR/densephrases-multi --output_dir $SAVE_DIR/densephrases-multi_sample

I am getting below error:

File "generate_phrase_vecs.py", line 396, in main() File "generate_phrase_vecs.py", line 392, in main dump_phrases(args, model, tokenizer) File "generate_phrase_vecs.py", line 85, in dump_phrases args, tokenizer, evaluate=True, output_examples=True, context_only=True File "/home/neerajku/neerajAvatar/DensePhrases/densephrases/utils/squad_utils.py", line 1199, in load_and_cache_examples context_only=context_only, args=args) File "/home/neerajku/neerajAvatar/DensePhrases/densephrases/utils/squad_utils.py", line 792, in get_dev_examples return self._create_examples(input_data, "dev", draft, context_only=context_only, args=args) File "/home/neerajku/neerajAvatar/DensePhrases/densephrases/utils/squad_utils.py", line 805, in _create_examples truecase = TrueCaser(os.path.join(os.environ['DATA_DIR'], args.truecase_path)) File "/home/neerajku/neerajAvatar/DensePhrases/densephrases/utils/squad_utils.py", line 1301, in init with open(dist_file_path, "rb") as distributions_file: FileNotFoundError: [Errno 2] No such file or directory: '/mnt/neerajku/DensePhrases//densephrases-data/truecase/english_with_questions.dist'

Could you help me , how to resolve this

luomancs commented 2 years ago

Hi @neerajkuhike

I came cross the same issue, could you please tell me how to resolve this?

Thanks in advance.

jhyuklee commented 2 years ago

Hi, this is because the "english_with_questions.dist'" is located in a wrong directory. Could you double check?

luomancs commented 2 years ago

Hi @jhyuklee ,

I did not see "english_with_questions.dist" file, and also, if I want to generate index for my own dataset, how should I prepare this .dist file?

thanks for your response.

jhyuklee commented 2 years ago

It should be included in the dataset - truecase folder: "https://github.com/princeton-nlp/DensePhrases#1-datasets".