tangerzhang / ALLHiC

ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
173 stars 39 forks source link

Intermediate files in size of Tb? #130

Open HeQSun opened 2 years ago

HeQSun commented 2 years ago

Hi @tangerzhang,

I am running allhic with a genome of size around 3 Gb, at a read coverage of 100x.

I observed super large sizes of intermediate files during pruning step, like below:

total 5.3T -rwxrwx--- 1 sun 4.1T Jun 20 03:48 log.txt -rwxrwx--- 1 sun 7.7G Jun 20 03:48 removedb_Allele.txt -rwxrwx--- 1 sun 1.3T Jun 20 03:48 removedb_nonBest.txt*

The log file itself is around 4.1 Tb, and the program has not finished yet. Is this common? And, is there a way to handle the large files?

thanks, Hequan

HeQSun commented 2 years ago

Another info: Allele.ctg.table is 27 Mb, and I have ~20,000 contigs.

wangyibin commented 2 years ago

Hi, The final result of ALLHiC_prune is prunning.bam. You can use the development version of ALLHiC_prune (https://github.com/sc-zhang/ALLHiC_components/tree/main/Prune` ). This version does not generate intermediate files and has a speed increase.

HeQSun commented 2 years ago

Hi, The final result of ALLHiC_prune is prunning.bam. You can use the development version of ALLHiC_prune (https://github.com/sc-zhang/ALLHiC_components/tree/main/Prune` ). This version does not generate intermediate files and has a speed increase.

Thanks @wangyibin. I am running the version you mentioned.

hamidashrafi commented 2 years ago

Hi the links are broken. I have the same issue, it filled my file system and the system ran out of space due to the huge log file.

tangerzhang commented 2 years ago

Hi the links are broken. I have the same issue, it filled my file system and the system ran out of space due to the huge log file.

This link works: https://github.com/sc-zhang/ALLHiC_components/tree/main/Prune

spaddys commented 1 year ago

Hello! Do you know how long this step usually takes? This step has been running for about 3 days so far for me and I just want to make sure that's not unusual.