labstructbioinf / pLM-BLAST

Detection of remote homology by comparison of protein language model representations
https://toolkit.tuebingen.mpg.de/tools/plmblast
MIT License
45 stars 5 forks source link

Error when making custom database "IndexError: list index out of range" #11

Closed DaRinker closed 1 year ago

DaRinker commented 1 year ago

When I run the command:

python $PLMBLAST_PATH/embeddings.py merged.proteomes.csv merged.proteomes -embedder pt -cname sequence --gpu -bs -1 --asdir I get back:

File "/bin/pLM-BLAST/embedders/parser.py", line 173, in make_iterator
    if startbatch[-1] != seqnum:
IndexError: list index out of range

The "merged.proteomes.csv" was generated from a "merged.proteomes.fasta" using the "makeindex.py" script.

For context, the "merged.proteome.fasta" contains 6821190 sequences, while the "merged.proteomes.csv" contains 7360055 lines (so 538865 more lines than the number of sequences--is this expected?)

DaRinker commented 1 year ago

Found this discussed further down here. Changing the -bs -1 to -bs 0 seems to have fixed this error.