Closed zhouchang123 closed 4 months ago
I just execute the order : python DPR/generate_dense_embeddings.py \ model_file=${RETRIEVER} \ ctx_src=dpr_uprise shard_id=0 num_shards=1 \ out_file=$PWD/my_data/experiment/uprise/dpr_enc_index \ ctx_sources.dpr_uprise.prompt_pool_path=${PROMPT_POOL} \ ctx_sources.dpr_uprise.prompt_setup_type=qa \ encoder.cache_dir=${CACHE_DIR} \ hydra.run.dir=$PWD/my_data/experiment/uprise THEN THE PROBLEM OCCURED
after modify output = results[i] to output=result.get(i) a new problem occur
There seems to be an issue with parallel execution. How about restricting the CUDA devices to 1? Try running the following command:
export RETRIEVER=[DOWNLOADED_CKPT_PATH] # path to the downloaded retriever checkpoint
export PROMPT_POOL=[DOWNLOADED_POOL_PATH] # path to the downloaded prompt pool
export CACHE_DIR=[CAHCHE_DIR] # directory path for caching the LLM checkpoints, task datasets, etc.
CUDA_VISIBLE_DEVICES='0' python DPR/generate_dense_embeddings.py \
model_file=${RETRIEVER} \
ctx_src=dpr_uprise shard_id=0 num_shards=1 \
out_file=$PWD/my_data/experiment/uprise/dpr_enc_index \
ctx_sources.dpr_uprise.prompt_pool_path=${PROMPT_POOL} \
ctx_sources.dpr_uprise.prompt_setup_type=qa \
encoder.cache_dir=${CACHE_DIR} \
hydra.run.dir=$PWD/my_data/experiment/uprise
The problem seems has been solved,but where the result saved?
out_file=$PWD/my_data/experiment/uprise/dpr_enc_index However,after running the code,the file and the folder are not exist.
Could you please show me the log in your terminal when you finished the first execution?
This is the log.It seems no wrong happened. Moreover,so strange that I set CUDA_VISIBLE_DEVICES='2' but it still run on cuda'0'
Sometimes running the same command multiple times can lead to such errors. How about first removing the file folder and then rerunning the command:
Remove the existing folder:
rm -r $PWD/my_data/experiment/uprise/
Run the following command:
export RETRIEVER=[DOWNLOADED_CKPT_PATH] # Path to the downloaded retriever checkpoint
export PROMPT_POOL=[DOWNLOADED_POOL_PATH] # Path to the downloaded prompt pool
export CACHE_DIR=[CACHE_DIR] # Directory path for caching the LLM checkpoints, task datasets, etc.
CUDA_VISIBLE_DEVICES='0' python DPR/generate_dense_embeddings.py \
model_file=${RETRIEVER} \
ctx_src=dpr_uprise shard_id=0 num_shards=1 \
out_file=$PWD/my_data/experiment/uprise/dpr_enc_index \
ctx_sources.dpr_uprise.prompt_pool_path=${PROMPT_POOL} \
ctx_sources.dpr_uprise.prompt_setup_type=qa \
encoder.cache_dir=${CACHE_DIR} \
hydra.run.dir=$PWD/my_data/experiment/uprise
Thank you very much.But it still doesn't work,the dpr_enc_index folder remains non-exist.
Here is the override config file.
This is weird. I don't think the execution finished successfully. The log should end with writing information (Total passages processed %d. Written to %s
) rather than printing out the tensor (Producing encodings for passages range: %d to %d (out of total %d
). See uprise/DPR/generate_dense_embeddings.py.
Please check if the function gen_ctx_vectors()
executes successfully by adding some breakpoints like pdb
or print
.
I found the error, that's because I forgot to delete the code that I add when debug. Thank you very much.