Closed nick-youngblut closed 5 years ago
@nick-youngblut yes this indeed not great default behavior. We will change it. Your computation time is not wasted. You can restart the job by just calling the same command again. But before you need to create the header file (_h'
) file by calling mmseqs createsubdb repSeqDb seqDb_h repSeqDb_h
. Sorry for the inconvenience.
The restart behavior is quite nice. Thank you for explaining how to create the *_h file using mmseqs createsubdb
. I didn't understand what createsubdb
does, but I must have missed it in the wiki docs.
I'm trying to get the abundance of gene clusters generated by linclust. My method involves mapping the post-QC Illumina reads to the post-linclust cluster representatives via
mmseqs map
. To get the representative sequence db, I'm usingmmseqs result2repseq
. I ranmmseqs map
(actuallymmseqs search --alignment-mode 4
due to Issue #144), but after many hours of processing, I got the error that no "*_h" file exists for the database, and the map job died.Do I have to convert the rep-seq database to a fasta and then re-create the database with
mmseqs createdb
just so that I can generate the *_h file? Is there a more efficient way?Why doesn't
mmseqs search
check for the necessary files at the start of the job instead of in the middle of the run (possible after many hours of processing)?