Open pxxiao-hz opened 2 years ago
Hello,
There is an interactive software, juicebox
, can help us to correct assembly mistakes. Before assembly correction, we should generate the .hic
and .assembly
file. Here are some brief steps on how to generate these files:
1、Re-mapping HiC reads to contig assembly.
Dependency: chromap
# Build index chromap -i -r contig.fa -o index # Re-mapping chromap --preset hic -x index -r contig.fa -1 HiC_read1.fa -2 HiC_read2.fa -o aln.pair
2、Convert
.agp
file to.assembly
file. Dependency: juicebox_scriptspython agp2assembly.py groups.agp out.assembly
2、Convert
.pair
file to.hic
file Dependency: 3d-dnagrep -v '#' aln.pair |awk '{if($6!="+") $6=16; else $6=0; if($7!="+") $7=16; else $7=0} \ $2<=$4{print $6, $2, $3, 0, $7, $4, $5, 1, "1 - - 1 - - -" } \ $4<$2{print $7, $4, $5, 0, $6, $2, $3, 1, "1 - - 1 - - -" }' > out.links.txt
bash 3d-dna/visualize/run-assembly-visualizer.sh -p false out.assembly out.links.txt
4、How to use juicebox?
There are some tutorials below which may be useful.
- [The tutorial video for Juicebox Assembly Tools in youtube](https://www.youtube.com/watch?v=Nj7RhQZHM18)
- [Genome assembly cookbook of aidenlab](http://aidenlab.org/assembly/manual_180322.pdf)
Best wishes!
Hi,
Re-mapping HiC reads to contig assembly
mean? Is ‘contig assembly’ ‘groups.asm.fasta’ or '.assembly' generated by reads? 2.1 A
bam
file has been generated before running allhic. I want to extract the reads mapped to a chromosome to generate anew bam
file. 2.2 Generate.assebmly
grep 'read_...g1' groups.agp > chr1.agp
python agp2assembly.py chr1.agp chr1.assembly
2.3 Then,matlock bam2 juicer .new.bam out.links.txt
sort -k2,2 -k6,6 out.links.txt > out.sorted.links.txt
bash run-assembly-visualizer.sh -p false chr1.assembly out.sorted.links.txt
Thanks.
Hello,
It is the ‘contig assembly’ and also be called 'draft.asm.fasta'.
Firstly, the runtime of this pipeline is not very long. A 3GB genome can generate .hic
file in about 24h. Secondly, matlock
only work with proper-pair hic reads mapping bam. So, you must filter new bam
with pairtools
, Before work in one chromosomes. Here is the filter steps:
samtools view -h new.bam | \
pairtools parse -c draft.asm.sizes -o parsed.pairsam.gz
pairtools sort --nproc 8 -o sorted.pairsam.gz parsed.pairsam.gz
pairtools dedup --mark-dups -o deduped.pairsam.gz sorted.pairsam.gz
pairtools select \
'(pair_type == "UU") or (pair_type == "UR") or (pair_type == "RU")' \
-o filtered.pairsam.gz deduped.pairsam.gz
pairtools split --output-sam new.filtered.bam filtered.pairsam.gz
Best wishes!
Hello,
- It is the ‘contig assembly’ and also be called 'draft.asm.fasta'.
- Firstly, the runtime of this pipeline is not very long. A 3GB genome can generate
.hic
file in about 24h. Secondly,matlock
only work with proper-pair hic reads mapping bam. So, you must filternew bam
withpairtools
, Before work in one chromosomes. Here is the filter steps:samtools view -h new.bam | \ pairtools parse -c draft.asm.sizes -o parsed.pairsam.gz pairtools sort --nproc 8 -o sorted.pairsam.gz parsed.pairsam.gz pairtools dedup --mark-dups -o deduped.pairsam.gz sorted.pairsam.gz pairtools select \ '(pair_type == "UU") or (pair_type == "UR") or (pair_type == "RU")' \ -o filtered.pairsam.gz deduped.pairsam.gz pairtools split --output-sam new.filtered.bam filtered.pairsam.gz
Best wishes!
in addition, you can filter bam file with python script filter_bam_unpaired.py.
samtools sort -n new.bam > new.nameSorted.bam
python filter_bam_unpaired.py new.nameSorted.bam new.filtered.bam
Dear Professor. I found a error when evaluating with the heatmap
ALLHiC_plot
, now I want to try flipping them, but I don't know their location and ctg id. How can I know their information and change them. Thanks. Best wishes!