linnabrown / run_dbcan

Run_dbcan V4, using genomes/metagenomes/proteomes of any assembled organisms (prokaryotes, fungi, plants, animals, viruses) to search for CAZymes.
http://bcb.unl.edu/dbCAN2
GNU General Public License v3.0
130 stars 40 forks source link

Memory issues and restart procedure #134

Closed aghozlane closed 7 months ago

aghozlane commented 7 months ago

Hi,

I am trying to annotate several protein catalogues with >1M proteins but run_dbcan is killed every time by slurm due to memory consumption. I thought 50G of ram would be enough but I do not pass the hmmer step, but still not..

First, is there anyway to avoid to repeat the diamond step ? It takes a day.. Second, do you have implemented parameters to slice the data and avoid these memory issues ? Is there a nextflow or a snakemake somewhere to better utilize a cluster ?

Thank you,

linnabrown commented 7 months ago

I suggest that you can split your input proteins into different batches than run our code by submitting multiple jobs.

aghozlane commented 7 months ago

Hi, I wrote this https://gitlab.pasteur.fr/aghozlan/nf-dbcan to overcome this memory issue.