yuansliu / minicom

Index suffix-prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression
5 stars 2 forks source link

Minicom reports segment fault when compressing ERP001775_1 #4

Open i-xiaohu opened 3 years ago

i-xiaohu commented 3 years ago

Hello, Yuansheng Liu. Our group is developing a sequence aligner which is faster on reordered reads. We have tested three reordering-based compressors, SPRING, Minicom and PgRC. But Minicom always reports segment fault on large human dataset, for example ERP001775_1 size of 217Gbp. It is available from the links down below. The coverages of subsets constituted of files is from 7.2X to 34.6X.

ftp.sra.ebi.ac.uk/vol1/fastq/ERR174/ERR174324/ERR174324_1.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/ERR174/ERR174325/ERR174325_1.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/ERR174/ERR174326/ERR174326_1.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/ERR174/ERR174327/ERR174327_1.fastq.gz
ftp.sra.ebi.ac.uk/vol1/fastq/ERR174/ERR174328/ERR174328_1.fastq.gz

The testing machine is equipped with 96 Intel(R) Xeon(R) CPU E7-4830 v3 @ 2.10GHz processors and 1T RAM. Gcc (version 6.4.0) is used to compile Minicom on operation system CentOS release 6.6. Minicom works well on low coverage. But it fails on the high coverage (ERP001775_1.fq constituted by 5 sub files above).

ulimit -s unlimited
minicom -r ERP001775_1.fq -t 16
compress single-end reads...
23638 Segmentation fault      (core dumped) ./minicomsg $filename $compfiles

We are appreciated if you fix the problem and help us complete the experiment results.

Thank you. Best regards. i-xiaohu (HIT-CS)

yuansliu commented 3 years ago

Dear Xiaohu,

Thank you for your test and sorry for the inconvenience.

I will try to fix it. But, I am not sure it can be solved as I have no machine with such big RAM for the big dataset.

Best regards, Yuansheng