Closed rapaJiahe closed 5 years ago
Yes, ALLHiC can be applied to diploid plants as well. I will post the details of our best practice on simple diploid genomes once I get a chance. But, briefly, please follow the command lines below to anchor a diploid genome:
ALLHiC_partition -b sample.clean.bam -r draft.asm.fasta -e AAGCTT -k 16
Note: restriction sites (-e) and number of clusters (-k) should be modified accordinglyallhic extract sample.clean.bam draft.asm.fasta --RE AAGCTT
allhic optimize sample.clean.counts_AAGCTT.16g1.txt sample.clean.clm
allhic optimize sample.clean.counts_AAGCTT.16g2.txt sample.clean.clm
...
allhic optimize sample.clean.counts_AAGCTT.16g16.txt sample.clean.clm
ALLHiC_build draft.asm.fasta
perl getFaLen.pl -i groups.asm.fasta -o len.txt
Note: script can be found here (https://github.com/tangerzhang/my_script/blob/master/getFaLen.pl)
grep 'merge.clean.counts_GATC' len.txt > chrn.list
ALLHiC_plot sample.clean.bam groups.agp chrn.list 500k pdf
many thanks.
He
Hi, @tanghaibao and @tangerzhang "Heatmap Plot for assembly assessment (a) get group length perl getFaLen.pl -i groups.asm.fasta -o len.txt Note: script can be found here (https://github.com/tangerzhang/my_script/blob/master/getFaLen.pl) grep 'merge.clean.counts_GATC' len.txt > chrn.list Note: only keep chromosomal level assembly for plotting."
I want to generate the file of chrn.list for ALLHIC-plot, but the chrn.list is an empty file , and i have the question about how can i grep such content "merge.clean.counts_GATC" from a file which is consist of fasta name and seq length? can you give me some advice on running this process?Thanks.
Best, Sincerely!
Yes, ALLHiC can be applied to diploid plants as well. I will post the details of our best practice on simple diploid genomes once I get a chance. But, briefly, please follow the command lines below to anchor a diploid genome:
- mapping reads using bwa aln (same as we did in polyploidy)
- partition contigs into user pre-defined groups
ALLHiC_partition -b sample.clean.bam -r draft.asm.fasta -e AAGCTT -k 16
Note: restriction sites (-e) and number of clusters (-k) should be modified accordingly- Extract CLM file and counts of restriction sites
allhic extract sample.clean.bam draft.asm.fasta --RE AAGCTT
- run optimize for ordering and orientation (can be run in parallel)
allhic optimize sample.clean.counts_AAGCTT.16g1.txt sample.clean.clm
allhic optimize sample.clean.counts_AAGCTT.16g2.txt sample.clean.clm
...allhic optimize sample.clean.counts_AAGCTT.16g16.txt sample.clean.clm
- Get chromosomal level assembly
ALLHiC_build draft.asm.fasta
- Heatmap Plot for assembly assessment (a) get group length
perl getFaLen.pl -i groups.asm.fasta -o len.txt
Note: script can be found here (https://github.com/tangerzhang/my_script/blob/master/getFaLen.pl)grep 'merge.clean.counts_GATC' len.txt > chrn.list
Note: only keep chromosomal level assembly for plotting. (b) plottingALLHiC_plot sample.clean.bam groups.agp chrn.list 500k pdf
Hi, @tanghaibao and @tangerzhang I want to generate the file of chrn.list for ALLHIC-plot, but the chrn.list is an empty file , and i have the question about how can i grep such content "merge.clean.counts_GATC" from a file which is consist of fasta name and seq length? can you give me some advice on running this process?Thanks.
Best, Sincerely!
Could you please share a couple of lines in the len.txt file?
Could you please share a couple of lines in the len.txt file? I used getFaLen.pl -i groups.asm.fasta -o len.txt to generate the file , the format of the file is just below. I don't know how to solve this problem. Thanks.
group7 126768508 group8 118096512 group9 117541027 004194F|arrow|pilon 3231 002396F|arrow|pilon 301128 007445F|arrow|pilon 1462 006502F|arrow|pilon 20216 007368F|arrow|pilon 3852 007215F|arrow|pilon 7274 007091F|arrow|pilon 9688 006603F|arrow|pilon 18135
Best, Sincerely!
You can use the following command line: grep 'group' len.txt > chrn.list
Thank you very much!@tangerzhang I will try and tell you the final results about this process.
Best, Sincerely!
Hi,@tangerzhang, The code works well ! Thank you very much! I finally got the 500k pdf , but it looks not very clear, the backgroud disturbed too much , if it is possible, can you give me some advice on running this process? 500K_Whole_genome.pdf
The result looks good! The reason that it looks not very clear is possibly due to low coverage of sequencing depth or low rate of valid reads. Increasing the coverage should be able to solve this problem.
The result looks good! The reason that it looks not very clear is possibly due to low coverage of sequencing depth or low rate of valid reads. Increasing the coverage should be able to solve this problem.
Thank you @tangerzhang , Recently, i found something pretty strange,i have counted the length of the final assembly fasta file using scripts,as the chromosome number increases, the length becomes smaller. can you give me some advice ? I am not sure it is correct or not. Thanks. groups-length2.txt
Not quite sure I can get the idea. Did you mean when increasing the k value in partition step, you get more groups numbers and the length of each group decrease?
@tangerzhang , hi, I mean that when i finished all the step of ALLHIC, i got a final assembly file called groups.asm.fasta and groups.agp which stands for the description of scaffold. The chromosomal background of the material is 2n = 48 (AABB), so finally i got 24 “chromosome” sequences , i have counted the length of those sequences, i found as the chromosome id number increases, the length becomes smaller. The 24th sequence is only 182kb. I am not sure it is correct or not. Can you give me some advice ? Thanks. groups-length2.txt
Group23 and group24 are too small and should not be normal. Hi-C technology is not good at partition contigs. If you have genetic maps, you can cluster contigs based on linkage group, and then order contigs from each group. Alternatively, you may try to correct the mis-joined contigs using 3D-DNA and then scaffolding the corrected contigs using ALLHiC.
Hi, @tangerzhang
If I use the 3D-DNA
or the SALSA2
to correct the mis-joined contigs, should I align the HiC reads using the bwa mem
? Both the software suggested the bwa mem
for the PE 150 reads. ALLHiC could change the mapping pipeline to the bwa mem
?
Thanks.
Hi baozg, Only the filterBAM_forHiC.pl requires bwa aln. You can skip this script if you would like to use bwa mem.
Hi, @tangerzhang Seems in this case we don't need Allele.ctg.table, right? Thanks!
Yes, ALLHiC can be applied to diploid plants as well. I will post the details of our best practice on simple diploid genomes once I get a chance. But, briefly, please follow the command lines below to anchor a diploid genome:
mapping reads using bwa aln (same as we did in polyploidy)
partition contigs into user pre-defined groups
ALLHiC_partition -b sample.clean.bam -r draft.asm.fasta -e AAGCTT -k 16
Note: restriction sites (-e) and number of clusters (-k) should be modified accordingly
Extract CLM file and counts of restriction sites
allhic extract sample.clean.bam draft.asm.fasta --RE AAGCTT
run optimize for ordering and orientation (can be run in parallel)
allhic optimize sample.clean.counts_AAGCTT.16g1.txt sample.clean.clm
allhic optimize sample.clean.counts_AAGCTT.16g2.txt sample.clean.clm
...
allhic optimize sample.clean.counts_AAGCTT.16g16.txt sample.clean.clm
Get chromosomal level assembly
ALLHiC_build draft.asm.fasta
Heatmap Plot for assembly assessment
(a) get group length
perl getFaLen.pl -i groups.asm.fasta -o len.txt
Note: script can be found here (https://github.com/tangerzhang/my_script/blob/master/getFaLen.pl)
grep 'merge.clean.counts_GATC' len.txt > chrn.list
Note: only keep chromosomal level assembly for plotting.
(b) plotting
ALLHiC_plot sample.clean.bam groups.agp chrn.list 500k pdf
Hi, @tangerzhang Seems in this case we don't need Allele.ctg.table, right? Thanks!
Yes, ALLHiC can be applied to diploid plants as well. I will post the details of our best practice on simple diploid genomes once I get a chance. But, briefly, please follow the command lines below to anchor a diploid genome: mapping reads using bwa aln (same as we did in polyploidy) partition contigs into user pre-defined groups ALLHiC_partition -b sample.clean.bam -r draft.asm.fasta -e AAGCTT -k 16 Note: restriction sites (-e) and number of clusters (-k) should be modified accordingly Extract CLM file and counts of restriction sites allhic extract sample.clean.bam draft.asm.fasta --RE AAGCTT run optimize for ordering and orientation (can be run in parallel) allhic optimize sample.clean.counts_AAGCTT.16g1.txt sample.clean.clm allhic optimize sample.clean.counts_AAGCTT.16g2.txt sample.clean.clm ... allhic optimize sample.clean.counts_AAGCTT.16g16.txt sample.clean.clm Get chromosomal level assembly ALLHiC_build draft.asm.fasta Heatmap Plot for assembly assessment (a) get group length perl getFaLen.pl -i groups.asm.fasta -o len.txt Note: script can be found here (https://github.com/tangerzhang/my_script/blob/master/getFaLen.pl) grep 'merge.clean.counts_GATC' len.txt > chrn.list Note: only keep chromosomal level assembly for plotting. (b) plotting ALLHiC_plot sample.clean.bam groups.agp chrn.list 500k pdf
You are right. For scaffolding simple diploid genome, we do not need Allele.ctg.table. Please check the pipeline for scaffolding diploid genome (https://github.com/tangerzhang/ALLHiC/wiki/ALLHiC:-scaffolding-of-a-simple-diploid-genome).
Hi, @tanghaibao and @tangerzhang ,
From discriptions, i found this pipelines work well, and I am wondering if this process can be applied to diploid plants. if it is possible, can you give me some advice on running this process?
Best, He