tangerzhang / ALLHiC

ALLHiC: phasing and scaffolding polyploid genomes based on Hi-C data
173 stars 39 forks source link

How Allhic works for simple genomes? #87

Open pjx1990 opened 3 years ago

pjx1990 commented 3 years ago

Can I use allhic to anchor the simple genome, such as rice? Thanks.

tangerzhang commented 3 years ago

Sure! ALLHiC is definitely applicable to simple genomes. I've uploaded a file, namely ALLHiC_pip.sh (https://github.com/tangerzhang/ALLHiC/blob/master/bin/ALLHiC_pip.sh), which wraps a couple of functions including reads mapping, correction, partition, optimize, build and plot. This script is designed for Hi-C scaffolding of simple genomes. To run ALLHiC_corrector, the numpy and scipy package will be required. Please let me know if there is any question regarding this script.

pjx1990 commented 3 years ago

Thank you very much! Can ALLHiC anchor contigs to chromosome scale instead of scaffolds? In addition, do I need to manually correct the final result in other tools, such as Juice_box? In fact, I've run it with the pipeline you provided before, but the result of heatmap is not very good. I'll run it again with your new pipeline.

tangerzhang commented 3 years ago

Hi @pjx1990 ALLHiC can anchor contigs onto chromosomes if the number of chromosomes is given. The new pipeline I just uploaded includes correction of contigs and therefore may have a better performance than before. However, if the heatmap is not good enough, you may also need juice_box to adjust the results.

pjx1990 commented 3 years ago

Thanks. I've run the new pipeline(ALLHiC_pip.sh) once. But the error occured at line 104, it shows "line 104: ParaFly: command not found". I looked at the source code, but I didn't find this script. How can I solve it? In addition, this script relies on the pysam package, please attach this note.

tangerzhang commented 3 years ago

Thanks for mentioning us. The ParaFly comes from trinity package (https://github.com/trinityrnaseq/trinityrnaseq) and it is used for parallel running of dozens of command lines. I will add an update README shortly.

pjx1990 commented 3 years ago

Hi, I've run it successed, and the result much better than before, but still has some errors. Now I want to adjust it with Juice_box, but I don't know how to generate the appropriate file format(like .hic and .assembly) into juice_box, and how to get the final fasta file. By the way, the matplotlib package should be updated, because ALLHiC_plot showed a warning: MatplotlibDeprecationWarning: savefig() got unexpected keyword argument "filetype" which is no longer supported as of 3.3 and will become an error two minor releases later

tangerzhang commented 3 years ago

Hi @pjx1990 We had a discuss how to generate the .hic and .assembly files. Please see this thread (https://github.com/tangerzhang/ALLHiC/issues/68). And thanks for noticing us the matplot error.

xinghua1001 commented 3 years ago

Hi Dr. Zhang,

I think there is an error in the ALLHiC_pip.sh script:

 filter bam

samtools view -bq $threads sample.bwa_mem.bam  |samtools view -bt seq.HiCcorrected.fasta.fai > sample.unique.bam

the $threads may not be a parameter after -q, which means mapping quality?

tangerzhang commented 3 years ago

Yes, you are right! Thanks for pointing that out. I will fix it tomorrow once I can login my github account.

xinghua1001 @.***> 于2021年4月1日周四 下午11:30写道:

Hi Dr. Zhang,

I think there is an error in the ALLHiC_pip.sh script:

filter bam

samtools view -bq $threads sample.bwa_mem.bam |samtools view -bt seq.HiCcorrected.fasta.fai > sample.unique.bam

the $threads may not be a parameter after -q, which means mapping quality?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tangerzhang/ALLHiC/issues/87#issuecomment-811987461, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKXITVMMC372T2GYGHTTGSGRPANCNFSM4Z46FMRA .

StevenBai97 commented 3 years ago

Yes, you are right! Thanks for pointing that out. I will fix it tomorrow once I can login my github account. xinghua1001 @.***> 于2021年4月1日周四 下午11:30写道: Hi Dr. Zhang, I think there is an error in the ALLHiC_pip.sh script: ### filter bam samtools view -bq $threads sample.bwa_mem.bam |samtools view -bt seq.HiCcorrected.fasta.fai > sample.unique.bam the $threads may not be a parameter after -q, which means mapping quality? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKXITVMMC372T2GYGHTTGSGRPANCNFSM4Z46FMRA .

Hi, Dr Zhang I replaced “$threads” with "20" in the ALLHiC_pip.sh script. Is that right? In addition, I used HiC-Pro to make quality control of Hi-C data, but I don't know which result file can be used as the input data of ALLHiC_pip.sh script. ("XX.bwt2pairs_interaction.bam" or "XX.bwt2pairs.bam" or other file ?) Look forward to your reply. Thanks.

tangerzhang commented 3 years ago

Hi, I will prefer to use 40 as the quality cutoff, i.e., samtools view -bq 40 Actually, ALLHiC_pip.sh takes fastq files as input as this script will perform two rounds of reads mapping. For the first round, misjoined contigs will be corrected based on the initial reads mapping. And for the second round, these corrected contigs will be linked based on Hi-C signals.

StevenBai97 @.***> 于2021年4月2日周五 下午6:20写道:

Yes, you are right! Thanks for pointing that out. I will fix it tomorrow once I can login my github account. xinghua1001 @.***> 于2021年4月1日周四 下午11:30写道: … <#m-1061902387812951399> Hi Dr. Zhang, I think there is an error in the ALLHiC_pip.sh script: ### filter bam samtools view -bq $threads sample.bwa_mem.bam |samtools view -bt seq.HiCcorrected.fasta.fai > sample.unique.bam the $threads may not be a parameter after -q, which means mapping quality? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment) https://github.com/tangerzhang/ALLHiC/issues/87#issuecomment-811987461>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKXITVMMC372T2GYGHTTGSGRPANCNFSM4Z46FMRA .

Hi, Dr Zhang I replaced “$threads” with "20" in the ALLHiC_pip.sh script. Is that right? In addition, I used HiC-Pro to make quality control of Hi-C data, but I don't know which result file can be used as the input data of ALLHiC_pip.sh script. (".bwt2pairs_interaction.bam" or ".bwt2pairs.bam" or other file) ? Look forward to your reply. Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tangerzhang/ALLHiC/issues/87#issuecomment-812470018, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKRWLP7A5CAALNL6BF3TGWK53ANCNFSM4Z46FMRA .

StevenBai97 commented 3 years ago

Hi, I will prefer to use 40 as the quality cutoff, i.e., samtools view -bq 40 Actually, ALLHiCpip.sh takes fastq files as input as this script will perform two rounds of reads mapping. For the first round, misjoined contigs will be corrected based on the initial reads mapping. And for the second round, these corrected contigs will be linked based on Hi-C signals. StevenBai97 @.> 于2021年4月2日周五 下午6:20写道: Yes, you are right! Thanks for pointing that out. I will fix it tomorrow once I can login my github account. xinghua1001 @.> 于2021年4月1日周四 下午11:30写道: … <#m-1061902387812951399_> Hi Dr. Zhang, I think there is an error in the ALLHiC_pip.sh script: ### filter bam samtools view -bq $threads sample.bwa_mem.bam |samtools view -bt seq.HiCcorrected.fasta.fai > sample.unique.bam the $threads may not be a parameter after -q, which means mapping quality? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment) <#87 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKXITVMMC372T2GYGHTTGSGRPANCNFSM4Z46FMRA . Hi, Dr Zhang I replaced “$threads” with "20" in the ALLHiC_pip.sh script. Is that right? In addition, I used HiC-Pro to make quality control of Hi-C data, but I don't know which result file can be used as the input data of ALLHiC_pip.sh script. (".bwt2pairs_interaction.bam" or ".bwt2pairs.bam" or other file) ? Look forward to your reply. Thanks. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#87 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQ3NKRWLP7A5CAALNL6BF3TGWK53ANCNFSM4Z46FMRA .

Thanks for your reply. I will have a try according to your suggestions.

WeiSong-bio commented 3 years ago

Hi Dr. Zhang, Why does the script ALLHiC_pip.sh (https://github.com/tangerzhang/ALLHiC/blob/master/bin/ALLHiC_pip.sh) not have the process of ALLHiC_Rescue and extract, such as the method in this link? ( https://github.com/tangerzhang/ALLHiC/wiki ) Thanks.