Open jhoff13 opened 23 hours ago
Could you share your conversion script?
Can you run the same command as here to check if there are some broken FASTA entries: https://github.com/soedinglab/MMseqs2/issues/911#issuecomment-2516404541
Fixed - I converted each parquet file to a separate database where it only require ~20 Gb of RAM and runs fine.
Expected Behavior
I'm trying to run mmseqs search against the OMG_prot50 database. This database once converted to a mmseqs db is 42GB and was constructed by converting .parquet files to fastas using a custom script.
Current Behavior
The cmd ends with a prefilter error. I am submitted to nodes with 192gb of RAM and when watching the cmd run it only hits 25gb at its max before failing. Not sure if this is a RAM issue for this reason.
Steps to Reproduce (for bugs)
MMseqs Output (for bugs)
Context
Not sure why this is failing. I've tried splitting the database into thirds and tenths, same error. I have tried running with -s 1 --max-seqs 100 parameters to increase efficiency. It would be helpful to have this database preloaded to mmseqs.
Your Environment
Include as many relevant details about the environment you experienced the bug in.