microsoft / BioGPT

MIT License
4.29k stars 450 forks source link

Cannot run inference on PubMedQA-Large #23

Open VisionaryMind opened 1 year ago

VisionaryMind commented 1 year ago

Using your pre-trained model, the infer_large.sh script is failing as follows:

KeyError: "'_name'"
sed: can't read ../../checkpoints/QA-PubMedQA-BioGPT-Large/generate_checkpoint_avg.pt: No such file or directory
infer_large.sh: line 31: ../../checkpoints/QA-PubMedQA-BioGPT-Large/generate_checkpoint_avg.pt: No such file or directory
Traceback (most recent call last):
  File "/mnt/d/ml/biogpt/examples/QA-PubMedQA/postprocess.py", line 37, in <module>
    with open(out_file, "r", encoding="utf8") as fr:
FileNotFoundError: [Errno 2] No such file or directory: '../../checkpoints/QA-PubMedQA-BioGPT-Large/generate_checkpoint_avg.pt.detok'
Traceback (most recent call last):
  File "/mnt/d/ml/biogpt/examples/QA-PubMedQA/hard_match_evaluation.py", line 37, in <module>
    main()
  File "/mnt/d/ml/biogpt/examples/QA-PubMedQA/hard_match_evaluation.py", line 19, in main
    with open(pred_file) as reader:
FileNotFoundError: [Errno 2] No such file or directory: '../../checkpoints/QA-PubMedQA-BioGPT-Large/generate_checkpoint_avg.pt.detok.extracted.txt'

Please let me know if you have any suggestions to get it working. There seems to be a problem generating the output file.

AdirthaBorgohain commented 1 year ago

You probably will need to run the preprocess_large.sh script first before running the infer_large.sh script.

bgriffen commented 1 year ago

I also have the same error, even after running preprocess_large.sh. That preprocessing script doesn't seem to create the first file that's missing generate_checkpoint_avg.pt.

AdirthaBorgohain commented 1 year ago

Did you download and extract the trained checkpoint tgz file in the required directory?

If not, you need to do these steps:

mkdir checkpoints  
cd checkpoints  
wget https://msramllasc.blob.core.windows.net/modelrelease/BioGPT/checkpoints/QA-PubMedQA-BioGPT-Large.tgz  
tar -zxvf QA-PubMedQA-BioGPT-Large.tgz
VisionaryMind commented 1 year ago

I both downloaded and extracted the QA-PubMedQA-BioGPT-Large.tgz into the checkpoints directory and ran preprocess_large.sh. This error still occurs.

AdirthaBorgohain commented 1 year ago

Did you put the checkpoints directory inside the BioGPT directory? Because the paths it uses is relative and all the necessary directories has to be inside the BioGPT folder. From your error, it seems that it is not able to find the generate_checkpoint_avg.pt checkpoint file under the correct path.

VisionaryMind commented 1 year ago

Did you put the checkpoints directory inside the BioGPT directory? Because the paths it uses is relative and all the necessary directories has to be inside the BioGPT folder. From your error, it seems that it is not able to find the generate_checkpoint_avg.pt checkpoint file under the correct path.

I am not sure I understand your response. Isn't generate_checkpoint_avg.pt created by the infer_large.sh script? These variables:

MODEL_DIR=../../checkpoints/QA-PubMedQA-BioGPT-Large
MODEL=checkpoint_avg.pt
OUTPUT_FILE=generate_${MODEL}
OUTPUT_FILE=${MODEL_DIR}/${OUTPUT_FILE}

In my case, MODEL_DIR is present under checkpoints and it only contains one file (checkpoint_avg.pt). Further, in the inference section of the script, inference.py is called to create OUTPUT_FILE if it does not exist:

# inference
if [ ! -f "${OUTPUT_FILE}" ]; then
    echo "Begin inferencing ${INPUT_FILE} using ${MODEL_DIR}/${MODEL}"
    python ../../inference.py --data_dir=${DATA_DIR} --model_dir=${MODEL_DIR} --model_file=${MODEL} --src_file=${INPUT_FILE} --output_file=${OUTPUT_FILE}
fi

In fact, inference will not run if the output file is there. Inference.py creates the file here at the bottom of main:

def main(args):
    src_inputs = []
    with open(args.src_file) as reader:
        for line in reader:
            src_inputs.append(line.strip())

    m = TransformerLanguageModelPrompt.from_pretrained(
        args.model_dir, 
        args.model_file, 
        args.data_dir,
        max_len_b=args.decoding_length,
        max_tokens=12000,)

    print(m.cfg)

    if m.cfg.common.fp16:
        print('Converting to float 16')
        m.half()
    m.cuda()

    outputs = m.sample(src_inputs, beam=args.beam)

    with open(f"{args.output_file}", "w", encoding='utf8') as fw:
        for i in range(len(outputs)):
            fw.write(outputs[i] + '\n')

The code appears to be failing in TransformerLanguageModelPrompt.from_pretrained. It doesn't create the output file.

VisionaryMind commented 1 year ago

The issue is not in the output file not being found --- that error is happening because debpe is being run in the script before it has been created. Here is the beginning of the error message (that I omitted above):

Begin inferencing ../../data/PubMedQA/raw/biogpt-large-ansis_test.tok.bpe.x using ../../checkpoints/QA-PubMedQA-BioGPT-Large/checkpoint_avg.pt
2023-02-02 23:41:54 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-02-02 23:41:54 | INFO | fairseq.file_utils | loading archive file ../../checkpoints/QA-PubMedQA-BioGPT-Large
2023-02-02 23:42:26 | INFO | src.language_modeling_prompt | dictionary: 57717 types
Traceback (most recent call last):
  File "/mnt/d/ML/biogpt/examples/QA-PubMedQA/../../inference.py", line 47, in <module>
    main(args)
  File "/mnt/d/ML/biogpt/examples/QA-PubMedQA/../../inference.py", line 25, in main
    m = TransformerLanguageModelPrompt.from_pretrained(
  File "/home/biogptuser/anaconda3/envs/biogpt/lib/python3.10/site-packages/fairseq/models/fairseq_model.py", line 267, in from_pretrained
    x = hub_utils.from_pretrained(
  File "/home/biogptuser/anaconda3/envs/biogpt/lib/python3.10/site-packages/fairseq/hub_utils.py", line 73, in from_pretrained
    models, args, task = checkpoint_utils.load_model_ensemble_and_task(
  File "/home/biogptuser/anaconda3/envs/biogpt/lib/python3.10/site-packages/fairseq/checkpoint_utils.py", line 469, in load_model_ensemble_and_task
    model = task.build_model(cfg.model, from_checkpoint=True)
  File "/home/biogptuser/anaconda3/envs/biogpt/lib/python3.10/site-packages/fairseq/tasks/language_modeling.py", line 191, in build_model
    model = super().build_model(args, from_checkpoint)
  File "/home/biogptuser/anaconda3/envs/biogpt/lib/python3.10/site-packages/fairseq/tasks/fairseq_task.py", line 671, in build_model
    model = models.build_model(args, self, from_checkpoint)
  File "/home/biogptuser/anaconda3/envs/biogpt/lib/python3.10/site-packages/fairseq/models/__init__.py", line 102, in build_model
    "Available models: {}".format(MODEL_DATACLASS_REGISTRY.keys())
KeyError: "'_name'"

It looks like FairSeq cannot find the "_name" key in the MODEL_DATACLASS_REGISTRY. My FairSeq version is 0.12.0, per your recommendation. It looks like the problem is with FairSeq, though I don't yet see where it is coming from. It's failing at this assertion:

    assert model is not None, (
        f"Could not infer model type from {cfg}. "
        "Available models: {}".format(MODEL_DATACLASS_REGISTRY.keys())
        + f" Requested model type: {model_type}"
renqianluo commented 1 year ago

Hi @VisionaryMind , this is due to a rename bug. We have fixed it now. Please pull the latest code and re-download the QA-PubMedQA-BioGPT-Large.tgz checkpoint

rpolicastro commented 1 year ago

I pulled the latest version from github and redownloaded the checkpoint file. I ended up getting the same error as previous, but the temporary fix here https://github.com/microsoft/BioGPT/issues/17#issuecomment-1412000366 still resolved the issue.

import torch
from src.transformer_lm_prompt import TransformerLanguageModelPrompt

m = TransformerLanguageModelPrompt.from_pretrained(
        "checkpoints/QA-PubMedQA-BioGPT-Large",
        "checkpoint_avg.pt",
        "data/PubMedQA/biogpt-large-ansis-bin",
        tokenizer='moses',
        bpe='fastbpe',
        bpe_codes="data/biogpt_large_bpecodes",
        min_len=100,
        max_len_b=1024)
VisionaryMind commented 1 year ago

@renqianluo I pulled down the latest repository, re-downloaded the QA-PubMedQA-BioGPT-Large.tgz checkpoint, and implemented the above fix listed by @rpolicastro, and encounter the same exact error message. Neither solutions work for me.

bgriffen commented 1 year ago

Ditto, did the exact steps (including running preprocess_large.sh and infer_large.sh + re-download checkpoint) and still get the error:

  File "/home/bgriffen/Desktop/biogpt/BioGPT/examples/QA-PubMedQA/hard_match_evaluation.py", line 19, in main
    with open(pred_file) as reader:
FileNotFoundError: [Errno 2] No such file or directory: '../../checkpoints/QA-PubMedQA-BioGPT-Large/generate_checkpoint_avg.pt.detok.extracted.txt'
shashank140195 commented 1 year ago

Ditto, did the exact steps (including running preprocess_large.sh and infer_large.sh + re-download checkpoint) and still get the error:

  File "/home/bgriffen/Desktop/biogpt/BioGPT/examples/QA-PubMedQA/hard_match_evaluation.py", line 19, in main
    with open(pred_file) as reader:
FileNotFoundError: [Errno 2] No such file or directory: '../../checkpoints/QA-PubMedQA-BioGPT-Large/generate_checkpoint_avg.pt.detok.extracted.txt'

Probably because your script didn't generate an average checkpoint. use the best checkpoint instead

shashank140195 commented 1 year ago

I do have a question. how did you download the BioGPT large? using the URL gives me an error that is unable to load the parameters from the checkpoint. Did you use something else to download itit?