microsoft / evodiff

Generation of protein sequences and evolutionary alignments via discrete diffusion models
MIT License
528 stars 74 forks source link

Running the `evodiff/generate.py` script #30

Closed Amelie-Schreiber closed 3 months ago

Amelie-Schreiber commented 11 months ago

I've been having trouble with getting the conda environment to work properly, so this may be exacerbating the issue below.

(evodiff3) ubuntu@209-20-159-77:~/evodiff_repo$ python evodiff/generate.py --model-type oa_dm_38M --num-seqs 100
Traceback (most recent call last):
  File "evodiff/generate.py", line 323, in <module>
    main()
  File "evodiff/generate.py", line 40, in main
    data = UniRefDataset('data/uniref50/', 'train', structure=False, max_len=2048)
  File "/home/ubuntu/miniconda3/envs/evodiff3/lib/python3.8/site-packages/sequence_models/datasets.py", line 330, in __init__
    with open(data_dir + 'splits.json', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/uniref50/splits.json'
(evodiff3) ubuntu@209-20-159-77:~/evodiff_repo$ 
chAwater commented 8 months ago

@Amelie-Schreiber I ran to the same error and I found that you have to download the uniref50 (from https://github.com/microsoft/evodiff/issues/10#issuecomment-1747536718) to run the code.

I believe it's not necessary, you can hack the code to bypass it.