lemene / PECAT

PECAT, a phased error correct and assembly tool
BSD 2-Clause "Simplified" License
35 stars 0 forks source link

floating point exception #22

Open RemiAllio opened 4 months ago

RemiAllio commented 4 months ago

Hi!

I am trying to run PECAT. Unfortunately I have the following error during the assembly step:

2024-04-10 09:58:13 [INFO] Load 2896539 reads from file: PATH/corrected_reads.fasta 2024-04-10 09:58:24 [INFO] Load overlap file 2024-04-10 09:58:28 [INFO] Overlap size: 28367477/30188710 2024-04-10 09:58:42 [INFO] Group overlaps and remove duplicated 2024-04-10 09:59:19 [INFO] Overlap size: 22551641/28367477 2024-04-10 09:59:42 [INFO] Check Coverage 2024-04-10 09:59:45 [INFO] min_coverage = 3(55), max_coverage = 4284, max_diff_coverage = 2616 PATH/scripts/asm1_assemble.sh : line 16 : 1066309 floating point exception (core dumped) [...]

I know that this kind of error happens when there is a division by zero but I don't know why it is happening in this case. Do you have any idea?

Best, Rémi

N.B: I removed the PATH for clarity but everything is fine with that.

lemene commented 4 months ago

Hi, @RemiAllio I think the function that caused the error may be AsmDataset::EstimateGenomeSize() in asm_dataset.cpp. The code is as follows:

    std::sort(covs.begin(), covs.end());
    int ave_cov = covs[covs.size()/2].
    long long int gsize = size / ave_cov;

ave_cov may be equal to zero. If that's the case, it means many reads don't overlap with others.

RemiAllio commented 4 months ago

Thank you for the quick answer. Is there anything that I can do to avoid this? This sounds surprising to me since we have an expected coverage of about 40x for this species.

lemene commented 4 months ago

I have fixed this problem on the main branch; you can download the latest version and give it a try.

RemiAllio commented 4 months ago

Thank you! The new version is currently running and went through this step.

Just to try to understand why I got this issue in the first place. Could it be because I didn't change the parameter prep_output_coverage (originaly set to 80)? Since I am expecting a coverage of about 40x should I change this parameter to 30x, for example?

I am not sure to understand what this parameter means. Thank you for your help!

Best, Rémi

RemiAllio commented 4 months ago

Hi,

In the new version I got the following message: 2024-04-11 09:58:06 [INFO] min_coverage = 3(59), max_coverage = 3619, max_diff_coverage = 2570 2024-04-11 09:58:09 [WARNING] The estimated average coverage of the reads is 0 2024-04-11 09:58:09 [INFO] Esitmate genome size(0): 0 = 16827212525 / 0

I don't understand how I can have an average coverage equal to 0 with min_coverage = 3 and max_coverage = 3619. Is this normal?

Thank you for your help, Best, Rémi

lemene commented 4 months ago

@RemiAllio What is the genome_size you have set. genome_size * prep_output_coverage indicate how much data is used for assembly. [INFO] min_coverage = 3(59), max_coverage = 3619, max_diff_coverage = 2570 are filtering parameters automatically detected by the assembler, not the actual coverage of reads. I suggest using more data for assembly, or adding --min_coverage 0 to the asm1_assemble_options. Besides, what is the size of the corrected data and raw reads?