tangerzhang / ALLHiC

ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
170 stars 39 forks source link

ALLHiC_corrector crashes #91

Open HMPNK opened 3 years ago

HMPNK commented 3 years ago

Hi, I am trying to run ALLHiC_corrector, but get the following error. DO you have any recommendations?

[13:29:17] Contig: ptg000402l Getting mapping list [13:29:17] Contig: ptg001136l Getting mapping list Traceback (most recent call last): File "/data2/DUNJA/ALLHIC/ALLHiC/bin/ALLHiC_corrector", line 310, in ALLHiC_correct(in_bam, in_fa, out_fa, mapq, dep_size, bin_size, narrow_bin_size, percent, sensitive, thread) File "/data2/DUNJA/ALLHIC/ALLHiC/bin/ALLHiC_corrector", line 260, in ALLHiC_correct sub_mismatch = r.get() File "/home/kuhl/miniconda2/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value TypeError: fetch() got an unexpected keyword argument 'contig'

HMPNK commented 3 years ago

Found a solution, by creating an environment with python3.7 and sorting and indexing bam files:

conda create -y -n allhic python=3.7 samtools bedtools matplotlib pysam conda activate allhic samtools sort -@ 32 -o Q60.sort.bam Q60.bam samtools index Q60.sort.bam ALLHiC_corrector -m Q60.sort.bam -r TEST.fa -o TEST.fa-corrected -t 8

HMPNK commented 3 years ago

I realized that the bwa mapping is really slow for large genomes. Could you provide some hints how to integrate tools like minimap2 into the ALLHIC pipeline? Would you run in paired end mode? Or run each pair independently and then merge the results?

tangerzhang commented 3 years ago

Hi @HMPNK, I am not aware that minimap2 is suitable for Illumina reads mapping. Actually, bwa mapping is a bit slow, however, you can split the big fastq files into a number of small files using seqkit split2 command, and then run each individual PE fastq in parallel. The resulting bam files can subsequently merged using samtools merge or sambamba merge.

HMPNK commented 3 years ago

Hi, thanks for your recommendations. Regarding minimap2, it is a pretty fast short read mapper if using parameters "-a -x sr"

I am currently stucked with another issue (I already tested different samtools versions (0.1.19-44428cd, version 1.9 and version 1.12)):

ALLHiC_partition -b prunning.bam -r TEST.fa -e GATC -m 0 -k 150 Extract function: calculate an empirical distribution of Hi-C link size based on intra-contig links CMD: allhic extract prunning.bam TEST.fa --RE GATC 16:39:12 writeRE | NOTICE RE counts in 21172 contigs (total: 13306257, avg 1 per 381 bp) written to prunning.counts_GATC.txt 16:39:12 extractContigLinks | NOTICE Parse bamfile prunning.bam 16:39:12 extractContigLinks | ERROR Cannot open bamfile prunning.bam (sam: reference already used) Partition contigs based on prunning bam file CMD: allhic partition prunning.counts_GATC.txt prunning.pairs.txt 150 --minREs 0 16:39:12 ReadCSVLines | NOTICE Parse csvfile prunning.counts_GATC.txt 16:39:12 readRE | NOTICE Loaded 21172 contig RE lengths for normalization from prunning.counts_GATC.txt 16:39:12 skipContigsWithFewREs | NOTICE skipContigsWithFewREs with MinREs = 0 (RE = GATC) 16:39:12 skipContigsWithFewREs | NOTICE Marked 0 contigs (avg 0.0 RE sites, len 0) since they contain too few REs (MinREs = 0) 16:39:12 ReadCSVLines | NOTICE Parse csvfile prunning.pairs.txt 16:39:12 mustOpen | CRITIC open prunning.pairs.txt: no such file or directory

HMPNK commented 3 years ago

Have solved it, this was due to read having "/1" "/2" naming convention in my bam file.

wyl1219 commented 1 year ago

Hi Dr zhang, I run the allhiccorrector and get a file.out: [32m[09:18:48] Contig: utg000008l Getting hic list with bin size: 25000 [09:18:49] Contig: utg000008l Getting wide mismatch [09:18:49] Contig: utg000008l Getting narrow score with bin size: 1000 [09:18:49] Contig: utg000008l Getting narrow mismatch [09:18:49] Contig: utg000063l Getting mapping list [09:18:49] Contig: utg000063l Getting hic list with bin size: 25000 [09:18:49] Contig: utg000063l Getting wide mismatch [09:18:49] Contig: utg000063l Could not found mismatch_

i want to know if the could not found mismatch will affect the subsequent analysis? Looking forward to your reply,thanks very much.