refresh-bio / FaStore

FaStore - high-performance FASTQ files compressor
GNU General Public License v3.0
15 stars 7 forks source link

problems with variable length fastq #6

Open jvolkening opened 5 years ago

jvolkening commented 5 years ago

I am getting errors on certain input files. An example is here: https://gist.github.com/jvolkening/0561163aeda8b884f9284eff4a66653a

When I compile latest master FaStore with debugging enabled and run

fastore_compress.sh --in test_1000_new.fq --out FOO --reduced --threads 2

I get

fastore_rebin: NodesPacker.cpp:1156: void IFastqNodesPackerDyn::UnpackFromBin(const BinaryBinBlock&, std::vector<FastqRecord>&, GraphEncodingContext&, FastqRecordBinStats&, IFastqChunkCollection&, bool): Assertion `dnaReader.Position() - initialDnaPos == (uint64)desc.dnaSize' failed.
/home/jeremy/.local/bin/fastore_compress.sh: line 227: 11725 Aborted                 $FASTORE_REBIN e "-i$TMP_BIN" "-o$TMP_REBIN-2" "-t$TH_REBIN" $PAR_REBIN_C1 $PAR_PE -p2

I noticed that if I trim the reads to a given length and remove any remaining shorter reads (i.e. make all reads the same length) then compression succeeds. If I then remove a base/qual off the end of a random read, the errors return.