maxplanck-ie / HiCAssembler

Software to assemble contigs/scaffolds into chromosomes using Hi-C data
27 stars 4 forks source link

misassembly correction #7

Open yongyiyu opened 5 years ago

yongyiyu commented 5 years ago

Hi Fidel, I have created a corrected Hi-C matrix in h5 format by HiCExporer,now an assertion error killed the assembly while using the HiCAssembler. " Traceback (most recent call last): File "/annoroad/data1/bioinfo/PMO/yangweifei/Hicassemble/py2/bin/assemble", line 312, in main(args) File "/annoroad/data1/bioinfo/PMO/yangweifei/Hicassemble/py2/bin/assemble", line 308, in main chain_file=args.outFolder + "/liftover.chain") File "/annoroad/data1/bioinfo/PMO/yangweifei/Hicassemble/py2/bin/assemble", line 218, in save_fasta assert(next_contig['start'] - end >= 0) AssertionError " I'm a little confused while checking the code of assemble. The misassembly correction of my data mybe like the overlap of data. As the principle you described , "this means that a contig was split by the misassembly correction but was later joined together", the overlap of data isn't be considered , and I think this kind of misassembly correction shouldn't be joined.

Best regards, yongyi

fidelram commented 5 years ago

To be clear, did you run the misassembly correction before?

On Thu, Jan 10, 2019 at 9:42 AM yongyiyu notifications@github.com wrote:

Hi Fidel, I have created a corrected Hi-C matrix in h5 format by HiCExporer,now an assertion error killed the assembly while using the HiCAssembler. " Traceback (most recent call last): File "/annoroad/data1/bioinfo/PMO/yangweifei/Hicassemble/py2/bin/assemble", line 312, in main(args) File "/annoroad/data1/bioinfo/PMO/yangweifei/Hicassemble/py2/bin/assemble", line 308, in main chain_file=args.outFolder + "/liftover.chain") File "/annoroad/data1/bioinfo/PMO/yangweifei/Hicassemble/py2/bin/assemble", line 218, in save_fasta assert(next_contig['start'] - end >= 0) AssertionError " I'm a little confused while checking the code of assemble. The misassembly correction of my data mybe like the overlap of data. As the principle you described , "this means that a contig was split by the misassembly correction but was later joined together", the overlap of data isn't be considered , and I think this kind of misassembly correction shouldn't be joined.

Best regards, yongyi

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maxplanck-ie/HiCAssembler/issues/7, or mute the thread https://github.com/notifications/unsubscribe-auth/AEu_1QdLKFcXF00UfRhKKNQ4bpceV_bXks5vBvzdgaJpZM4Z5Blu .

yongyiyu commented 5 years ago

Hi Fidel, I ran the misassembly correction before using the HiCAssembler. Now when testing the HiCAssembler with the Hi-C matrix that wasn't corrected,it worked successfully. So I think that the situation of the overlap may not be considered. The code is followed:

“hicBuildMatrix --samFiles L3-8_Lib1_Lane1_genome.reads1.bam \
L3-8_Lib1_Lane1_genome.reads2.bam --binSize 10000 --restrictionSequence GATC --threads 4 \ --inputBufferSize 100000 --outBam hic.bam -o hic_matrix.h5 --QCfolder ./hicQC”

“hicCorrectMatrix correct -m hic_matrix.h5 -t -1.2 5 -o hic_corrected_matrix.h5”

“assemble -f genome.fa -m hic_corrected_matrix.h5 -o ./assembly_output \ --min_scaffold_length 100000 --bin_size 5000 --misassembly_zscore_threshold -1.0 \ --num_iterations 3 --num_processors 16”

yongyi

xuxiaoman0212 commented 4 years ago

Hi yongyi,

I got the same error, how did you solve it? @yongyiyu

Thanks a lot!