"data" is not found after executing the code on Github

thiptanawatp commented 1 year ago

Hello,

Could anybody please guide me that how I can run the standard BioGPT model by using the current below code?

import torch from fairseq.models.transformer_lm import TransformerLanguageModel m = TransformerLanguageModel.from_pretrained( "checkpoints/Pre-trained-BioGPT", "checkpoint.pt", "data", tokenizer='moses', bpe='fastbpe', bpe_codes="data/bpecodes", min_len=100, max_len_b=1024) m.cuda() src_tokens = m.encode("COVID-19 is") generate = m.generate([src_tokens], beam=5)[0] output = m.decode(generate[0]["tokens"]) print(output)

After running this, I always get the error that the data is not found. Not sure if I have to download the data from an external source separately or not.

Thanks

Dontmindmes commented 1 year ago

I am getting the same error

ahvdk commented 1 year ago

@thiptanawatp did you clone the repo itself? It contains the data.

thiptanawatp commented 1 year ago

@ahvdk I did both download in .ZIP file manually and git clone ... but the data didn't appear under the BioGPT/data folder except bpecodes and dict.txt. Any suggestion?

Thanks so much

thisismygitrepo commented 1 year ago

Did you cd to the repo before you run your script, or at least add path of repo to PATH of python?

microsoft / BioGPT

"data" is not found after executing the code on Github #65