Running the `evodiff/generate.py` script

microsoft / evodiff

Generation of protein sequences and evolutionary alignments via discrete diffusion models

MIT License

528 stars 74 forks source link

(evodiff3) ubuntu@209-20-159-77:~/evodiff_repo$ python evodiff/generate.py --model-type oa_dm_38M --num-seqs 100 Traceback (most recent call last): File "evodiff/generate.py", line 323, in <module> main() File "evodiff/generate.py", line 40, in main data = UniRefDataset('data/uniref50/', 'train', structure=False, max_len=2048) File "/home/ubuntu/miniconda3/envs/evodiff3/lib/python3.8/site-packages/sequence_models/datasets.py", line 330, in __init__ with open(data_dir + 'splits.json', 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: 'data/uniref50/splits.json' (evodiff3) ubuntu@209-20-159-77:~/evodiff_repo$

@Amelie-Schreiber I ran to the same error and I found that you have to download the uniref50 (from https://github.com/microsoft/evodiff/issues/10#issuecomment-1747536718) to run the code.

I believe it's not necessary, you can hack the code to bypass it.

Comment out the code https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L40-L41 https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L128-L129 https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L148
Add one line of code https://github.com/microsoft/evodiff/blob/32e3bd8d1ada1d786795e3ba1b84c855b22b4702/evodiff/generate.py#L130
the sequence length you want to sample from, for example (30, 200)
seq_len = np.random.choice(np.arange(30, 200))

run the bash command

export AMLT_OUTPUT_DIR=YOUR_OUTPUT_DIR; python evodiff/generate.py --model-type oa_dm_38M --num-seqs 10 --amlt`

microsoft / evodiff

Running the `evodiff/generate.py` script #30

the sequence length you want to sample from, for example (30, 200)