thunlp / ConvDR

Code repo for SIGIR 2021 paper "Few-Shot Conversational Dense Retrieval"
MIT License
41 stars 8 forks source link

about Generate Document Embeddings #9

Closed wjm-123 closed 2 years ago

wjm-123 commented 2 years ago

Thanks for sharing your code. It was a very rewarding job. I'm trying to reproduce the work. But there were some problems. when I run the code "gen_passage_embeddings.py" It always makes mistakes when "merging embeddings". I traced the code to find problems with the function "barrier_array_merge" in file util.py. The size of the variable "data_list" keeps increasing until out of memory. But "data_list" don't output. However, the "data_array" is write into files. As far as I know, "data_array" is the input parameter, and nothing is done to it, so why print it out? I would appreciate it if you could explain it. Ps: I'm sure I have enough memory.

Best wishes!

KristenZHANG commented 2 years ago

Hi wjm-123, I want to ask do u solve the problem? It seems that I encountered similar issue when "merging embeddings": subprocess.CalledProcessError: Command died with <Signals.SIGKILL: 9>. And use the command " dmesg -T | grep -E -i -B100 'killed process' " found that: image.

Want to seek some help from you, thanks!

wjm-123 commented 2 years ago

Hi Kristen, If you don't have enough memory, try splitting the data.