haowenz / chromap

Fast alignment and preprocessing of chromatin profiles
https://haowenz.github.io/chromap/
MIT License
192 stars 21 forks source link

Didn't reach the end of sequence file, which might be corrupted! #134

Closed MonsterLaplace closed 1 year ago

MonsterLaplace commented 1 year ago

Build index for the reference. Kmer length: 17, window size: 7 Reference file: /mnt/z/02.contig/contig.fa Output file: contigs.index Loaded all sequences successfully in 3.63s, number of sequences: 9161, number of bases: 3047640885. Collecting minimizers. Collected 764268770 minimizers. Sorting minimizers. Sorted all minimizers. Kmer size: 17, window size: 7. Lookup table size: 358627340, # buckets: 536870912, occurrence table size: 488674334, # singletons: 275594436. Built index successfully in 188.85s. [M::Statistics] kmer size: 17; skip: 7; #seq: 9161 [M::Statistics::2.763] distinct minimizers: 358627340 (76.85% are singletons); average occurrences: 2.131; average spacing: 3.988 Saved in 10.23s. Preset parameters for Hi-C are used. Start to map reads. Parameters: error threshold: 4, min-num-seeds: 2, max-seed-frequency: 500,1000, max-num-best-mappings: 1, max-insert-size: 1000, MAPQ-threshold: 1, min-read-length: 30, bc-error-threshold: 1, bc-probability-threshold: 0.90 Number of threads: 200 Analyze bulk data. Won't try to remove adapters on 3'. Will remove PCR duplicates after mapping. Will remove PCR duplicates at bulk level. Won't allocate multi-mappings after mapping. Only output unique mappings after mapping. Only output mappings of which barcodes are in whitelist. Allow split alignment. Output mappings in SAM format. Reference file: /mnt/z/02.contig/contig.fa Index file: contigs.index 1th read 1 file: /mnt/z/00.rawdata/HiCR1.fq.gz 1th read 2 file: /mnt/z/00.rawdata/HiCR2.fq.gz Output file: aligned.sam Loaded all sequences successfully in 3.51s, number of sequences: 9161, number of bases: 3047640885. Kmer size: 17, window size: 7. Lookup table size: 358627340, occurrence table size: 488674334. Loaded index successfully in 5.35s. Mapped 500000 read pairs in 6.38s. Mapped 500000 read pairs in 4.46s. Mapped 500000 read pairs in 2.06s. Mapped 500000 read pairs in 8.04s. Mapped 500000 read pairs in 1.85s. Mapped 500000 read pairs in 1.80s. Mapped 500000 read pairs in 9.43s. Mapped 500000 read pairs in 1.78s. Mapped 500000 read pairs in 1.99s. Mapped 500000 read pairs in 6.69s. Mapped 500000 read pairs in 2.01s. Mapped 500000 read pairs in 1.66s. Mapped 500000 read pairs in 6.41s. Mapped 500000 read pairs in 1.65s. Mapped 500000 read pairs in 1.69s. Mapped 500000 read pairs in 6.58s. Mapped 500000 read pairs in 2.14s. Mapped 500000 read pairs in 1.75s. Mapped 500000 read pairs in 7.18s. Mapped 500000 read pairs in 1.67s. Mapped 500000 read pairs in 1.70s. Mapped 500000 read pairs in 7.73s. Mapped 500000 read pairs in 1.64s. Mapped 500000 read pairs in 1.65s. Mapped 500000 read pairs in 6.47s. Didn't reach the end of sequence file, which might be corrupted!

mourisl commented 1 year ago

Sorry for the delayed reply. Could you please use command like "zcat -t XXX" to check the integrity of the input fastq files?

MonsterLaplace commented 1 year ago

Sorry for the delayed reply. Could you please use command like "zcat -t XXX" to check the integrity of the input fastq files?

I redownloaded the HiC data and check the md5 value, and the problem solved. Thank you for your reply.

mourisl commented 1 year ago

Great! I will close this issue for now. If you find other problems, please feel free to reopen this one or create a new issue.