smithlabcode / falco

A C++ drop-in replacement of FastQC to assess the quality of sequence read data
https://falco.readthedocs.io
GNU General Public License v3.0
96 stars 10 forks source link

corrupted size vs. prev_size #57

Open nick-youngblut opened 5 months ago

nick-youngblut commented 5 months ago

I'm using quay.io/biocontainers/falco:1.2.2--hdcf5f25_0. The run output:

[limits]    using file /usr/local/opt/falco/Configuration/limits.txt
[adapters]  using file /usr/local/opt/falco/Configuration/adapter_list.txt
[contaminants]  using file /usr/local/opt/falco/Configuration/contaminant_list.txt
[Mon Apr 29 18:30:02 2024] Started reading file 20241218_Parse_CRISPR_K562_cas12a_Sub1_R1_001.fastq.gz
[Mon Apr 29 18:30:02 2024] reading file as gzipped FASTQ format
[running falco|                                                   |  0%]corrupted size vs. prev_size
/home/nickyoungblut/tmp/auto-demux/work/20240426_SspArc0132/33/42662ba1c885c4ddfbc2724221e894/.command.sh: line 9:    36 Aborted                 (core dumped) falco 20241218_Parse_CRISPR_K562_cas12a_Sub1_R1_001.fastq.gz -D 20241218_Parse_CRISPR_K562_cas12a_Sub1_R1_001/fastqc_data.txt -R 20241218_Parse_CRISPR_K562_cas12a_Sub1_R1_001/fastqc_report.html -S 20241218_Parse_CRISPR_K562_cas12a_Sub1_R1_001/summary.txt
(nextflow)

seqkit stats -a -T 20241218_Parse_CRISPR_K562_cas12a_Sub1_R1_001.fastq.gz produces the following output:

file    format  type    num_seqs    sum_len min_len avg_len max_len Q1  Q2  Q3  sum_gap N50 N50_num Q20(%)  Q30(%)  AvgQual GC(%)
20241218_Parse_CRISPR_K562_cas12a_Sub1_R1_001.fastq.gz  FASTQ   DNA 24322546    12501788644 514 514.0   514 514.0   514.0   514.0   0   514 1   44.39   29.94   11.99   42.50

...so it appears that there is nothing wrong with the fastq file. Note the long read lengths. The RunInfo.xml for this Illumina run was skewed to long Read 1 lengths:

    <Reads>
      <Read NumCycles="514" Number="1" IsIndexedRead="N" />
      <Read NumCycles="86" Number="2" IsIndexedRead="N" />
    </Reads>
andrewdavidsmith commented 5 months ago

@nick-youngblut any chance you can reproduce with a smaller file that can be linked? If not, can you try it with the file unzipped?