microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs
https://aka.ms/GeneralAI
MIT License
3.39k stars 253 forks source link

A problem occur when execute the first procedure. #235

Closed zhouchang123 closed 4 days ago

zhouchang123 commented 6 days ago

image

zhouchang123 commented 6 days ago

image

zhouchang123 commented 6 days ago

I just execute the order : python DPR/generate_dense_embeddings.py \ model_file=${RETRIEVER} \ ctx_src=dpr_uprise shard_id=0 num_shards=1 \ out_file=$PWD/my_data/experiment/uprise/dpr_enc_index \ ctx_sources.dpr_uprise.prompt_pool_path=${PROMPT_POOL} \ ctx_sources.dpr_uprise.prompt_setup_type=qa \ encoder.cache_dir=${CACHE_DIR} \ hydra.run.dir=$PWD/my_data/experiment/uprise THEN THE PROBLEM OCCURED

zhouchang123 commented 6 days ago

after modify output = results[i] to output=result.get(i) a new problem occur image

cdxeve commented 6 days ago

There seems to be an issue with parallel execution. How about restricting the CUDA devices to 1? Try running the following command:

export RETRIEVER=[DOWNLOADED_CKPT_PATH] # path to the downloaded retriever checkpoint
export PROMPT_POOL=[DOWNLOADED_POOL_PATH] # path to the downloaded prompt pool
export CACHE_DIR=[CAHCHE_DIR] # directory path for caching the LLM checkpoints, task datasets, etc.

CUDA_VISIBLE_DEVICES='0' python DPR/generate_dense_embeddings.py \
     model_file=${RETRIEVER} \
     ctx_src=dpr_uprise shard_id=0 num_shards=1 \
     out_file=$PWD/my_data/experiment/uprise/dpr_enc_index \
     ctx_sources.dpr_uprise.prompt_pool_path=${PROMPT_POOL} \
     ctx_sources.dpr_uprise.prompt_setup_type=qa \
     encoder.cache_dir=${CACHE_DIR} \
     hydra.run.dir=$PWD/my_data/experiment/uprise
zhouchang123 commented 6 days ago

The problem seems has been solved,but where the result saved?

zhouchang123 commented 6 days ago

image out_file=$PWD/my_data/experiment/uprise/dpr_enc_index However,after running the code,the file and the folder are not exist.

cdxeve commented 5 days ago

Could you please show me the log in your terminal when you finished the first execution?

zhouchang123 commented 5 days ago

image image This is the log.It seems no wrong happened. Moreover,so strange that I set CUDA_VISIBLE_DEVICES='2' but it still run on cuda'0'

cdxeve commented 5 days ago

Sometimes running the same command multiple times can lead to such errors. How about first removing the file folder and then rerunning the command:

  1. Remove the existing folder:

    rm -r $PWD/my_data/experiment/uprise/
  2. Run the following command:

    export RETRIEVER=[DOWNLOADED_CKPT_PATH] # Path to the downloaded retriever checkpoint
    export PROMPT_POOL=[DOWNLOADED_POOL_PATH] # Path to the downloaded prompt pool
    export CACHE_DIR=[CACHE_DIR] # Directory path for caching the LLM checkpoints, task datasets, etc.
    
    CUDA_VISIBLE_DEVICES='0' python DPR/generate_dense_embeddings.py \
     model_file=${RETRIEVER} \
     ctx_src=dpr_uprise shard_id=0 num_shards=1 \
     out_file=$PWD/my_data/experiment/uprise/dpr_enc_index \
     ctx_sources.dpr_uprise.prompt_pool_path=${PROMPT_POOL} \
     ctx_sources.dpr_uprise.prompt_setup_type=qa \
     encoder.cache_dir=${CACHE_DIR} \
     hydra.run.dir=$PWD/my_data/experiment/uprise
zhouchang123 commented 5 days ago

Thank you very much.But it still doesn't work,the dpr_enc_index folder remains non-exist. image image

zhouchang123 commented 5 days ago

Here is the override config file. image

cdxeve commented 5 days ago

This is weird. I don't think the execution finished successfully. The log should end with writing information (Total passages processed %d. Written to %s) rather than printing out the tensor (Producing encodings for passages range: %d to %d (out of total %d). See uprise/DPR/generate_dense_embeddings.py.

image

Please check if the function gen_ctx_vectors() executes successfully by adding some breakpoints like pdb or print.

zhouchang123 commented 4 days ago

I found the error, that's because I forgot to delete the code that I add when debug. Thank you very much.