haowenz / chromap

Fast alignment and preprocessing of chromatin profiles
https://haowenz.github.io/chromap/
MIT License
192 stars 21 forks source link

[BUG] Segmentation fault (core dumped) HiC #135

Closed ericmalekos closed 1 year ago

ericmalekos commented 1 year ago

Hi, I'm encountering a segmentation fault error when I try to run in HiC mode.

Command line input:

chromap --preset hic  -r GRCh38.primary_assembly.genome.fa   -x hg38.primary_assembly.index   -1 SRR5519255_1.fastq.gz -2 SRR5519255_2.fastq.gz   -t 16 -o SRR5519255.pairs

Output:

Preset parameters for Hi-C are used.
Start to map reads.
Parameters: error threshold: 4, min-num-seeds: 2, max-seed-frequency: 500,1000, max-num-best-mappings: 1, max-insert-size: 1000, MAPQ-threshold: 1, min-read-length: 30, bc-error-threshold: 1, bc-probability-threshold: 0.90
Number of threads: 16
Analyze bulk data.
Won't try to remove adapters on 3'.
Won't remove PCR duplicates after mapping. 
Will remove PCR duplicates at bulk level.  
Won't allocate multi-mappings after mapping.
Only output unique mappings after mapping. 
Only output mappings of which barcodes are in whitelist.
Allow split alignment.
Output mappings in pairs format.
Reference file: GRCh38.primary_assembly.genome.fa
Index file: hg38.primary_assembly.index
1th read 1 file: SRR5519255_1.fastq.gz
1th read 2 file: SRR5519255_2.fastq.gz
Output file: dedupe_SRR5519255.pairs
Loaded all sequences successfully in 7.62s, number of sequences: 194, number of bases: 3099750718.
Kmer size: 17, window size: 7.
Lookup table size: 393376326, occurrence table size: 478581398.
Loaded index successfully in 13.97s.
Segmentation fault (core dumped)

Environment (please complete the following information):

swiftgenomics commented 1 year ago

Is this dataset publicly available? If not, can you provide a sample? How long are the reads?

ericmalekos commented 1 year ago

Yes the dataset is available at https://0-www-ncbi-nlm-nih-gov.brum.beds.ac.uk/Traces/study/?acc=PRJNA385337&o=acc_s%3Aa

swiftgenomics commented 1 year ago

I tried this dataset and I didn't get any error. So my guess is that the read files you have are corrupted. Can you check this?

ericmalekos commented 1 year ago

I realize the issue stems from pairing the wrong genome with the index. When I rebuilt the index the issue resolved. Thank you for your help!