A problem occur when execute the first procedure.

microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

https://aka.ms/GeneralAI

MIT License

3.71k stars 283 forks source link

A problem occur when execute the first procedure. #235

Closed zhouchang123 closed 4 months ago

zhouchang123 commented 4 months ago

I just execute the order : python DPR/generate_dense_embeddings.py \ model_file=${RETRIEVER} \ ctx_src=dpr_uprise shard_id=0 num_shards=1 \ out_file=$PWD/my_data/experiment/uprise/dpr_enc_index \ ctx_sources.dpr_uprise.prompt_pool_path=${PROMPT_POOL} \ ctx_sources.dpr_uprise.prompt_setup_type=qa \ encoder.cache_dir=${CACHE_DIR} \ hydra.run.dir=$PWD/my_data/experiment/uprise THEN THE PROBLEM OCCURED

zhouchang123 commented 4 months ago

after modify output = results[i] to output=result.get(i) a new problem occur

cdxeve commented 4 months ago

There seems to be an issue with parallel execution. How about restricting the CUDA devices to 1? Try running the following command:

export RETRIEVER=[DOWNLOADED_CKPT_PATH] # path to the downloaded retriever checkpoint
export PROMPT_POOL=[DOWNLOADED_POOL_PATH] # path to the downloaded prompt pool
export CACHE_DIR=[CAHCHE_DIR] # directory path for caching the LLM checkpoints, task datasets, etc.

CUDA_VISIBLE_DEVICES='0' python DPR/generate_dense_embeddings.py \
     model_file=${RETRIEVER} \
     ctx_src=dpr_uprise shard_id=0 num_shards=1 \
     out_file=$PWD/my_data/experiment/uprise/dpr_enc_index \
     ctx_sources.dpr_uprise.prompt_pool_path=${PROMPT_POOL} \
     ctx_sources.dpr_uprise.prompt_setup_type=qa \
     encoder.cache_dir=${CACHE_DIR} \
     hydra.run.dir=$PWD/my_data/experiment/uprise

zhouchang123 commented 4 months ago

The problem seems has been solved,but where the result saved?

zhouchang123 commented 4 months ago

out_file=$PWD/my_data/experiment/uprise/dpr_enc_index However,after running the code,the file and the folder are not exist.

cdxeve commented 4 months ago

Could you please show me the log in your terminal when you finished the first execution?

zhouchang123 commented 4 months ago

This is the log.It seems no wrong happened. Moreover,so strange that I set CUDA_VISIBLE_DEVICES='2' but it still run on cuda'0'

cdxeve commented 4 months ago

Sometimes running the same command multiple times can lead to such errors. How about first removing the file folder and then rerunning the command:

Remove the existing folder:
```
rm -r $PWD/my_data/experiment/uprise/
```

Run the following command:

export RETRIEVER=[DOWNLOADED_CKPT_PATH] # Path to the downloaded retriever checkpoint
export PROMPT_POOL=[DOWNLOADED_POOL_PATH] # Path to the downloaded prompt pool
export CACHE_DIR=[CACHE_DIR] # Directory path for caching the LLM checkpoints, task datasets, etc.

CUDA_VISIBLE_DEVICES='0' python DPR/generate_dense_embeddings.py \
 model_file=${RETRIEVER} \
 ctx_src=dpr_uprise shard_id=0 num_shards=1 \
 out_file=$PWD/my_data/experiment/uprise/dpr_enc_index \
 ctx_sources.dpr_uprise.prompt_pool_path=${PROMPT_POOL} \
 ctx_sources.dpr_uprise.prompt_setup_type=qa \
 encoder.cache_dir=${CACHE_DIR} \
 hydra.run.dir=$PWD/my_data/experiment/uprise

zhouchang123 commented 4 months ago

Thank you very much.But it still doesn't work,the dpr_enc_index folder remains non-exist.

zhouchang123 commented 4 months ago

Here is the override config file.

cdxeve commented 4 months ago

This is weird. I don't think the execution finished successfully. The log should end with writing information (Total passages processed %d. Written to %s) rather than printing out the tensor (Producing encodings for passages range: %d to %d (out of total %d). See uprise/DPR/generate_dense_embeddings.py.

Please check if the function gen_ctx_vectors() executes successfully by adding some breakpoints like pdb or print.

zhouchang123 commented 4 months ago

I found the error, that's because I forgot to delete the code that I add when debug. Thank you very much.