qjiangzhao / TEtrimmer

TEtrimmer: a novel tool to automate manual curation of transposable elements
GNU General Public License v3.0
43 stars 1 forks source link

Memory requirements #28

Open camille-cornet opened 3 days ago

camille-cornet commented 3 days ago

Hi Jiangzhao,

First of all, many thanks for this tool, it is going to be very useful for my project! I am trying to build a curated TE library for a genus of butterflies.

The problem is that when running TEtrimmer, the job always stops due to memory usage limits (I gave it up to 1TB, the max I can on my HPC). I have tried with different numbers of threads (48 to 1) and it always crashes.

The TE library was built using EarlGrey, with a sequential approach (adding more species one by one to get to a representative library). The final uncurated library is ~11 000 fasta sequences. To run TEtrimmer, I built a pangenome to have a representative reference of all my species, and it is 6.8 GB.

What approach do you think I should take to reduce the memory requirements of TEtrimmer?

Many thanks in advance for your help,

Best,

Camille

qjiangzhao commented 3 days ago

Hi Camille,

No worreis! Thanks you are interested with TEtrimmer. I still remember your introduction about your lovely butterfly project ;) To be honsest, I have never performed TEtrimmer based on a pan-genome. Theoritically, it is possible. You can try to solve this by decreasing the "--max_mas_lines" (default 100) to a smaller number like 50 (the minimum is 15). If the problem still exists, you can send me your error message and log file. I will have another look.

Yours sincerely Jiangzhao