Closed jinxin112233 closed 2 years ago
Hello @jinxin112233,
The trio-based evaluation expects haplotype specific kmers to be provided, and treats them as the 'truth'. These haplotype specific kmers should be obtained from parental kmers, using this script. Use mat.hapmers.meryl
and pat.hapmers.meryl
for your evaluation.
Then, I'd recommend to run the trio-mode Merqury freshly under a new directory.
No need to concatenate hap1 and hap2 fasta files!
Hi Arang Rhie, I used following steps as you recommended. My out put plot looks as below. It seems there are few hapmers (circles). Does this output look fine??
1) I have no child meryl dbs so made child meryl dbs as follow: meryl union-sum hap1.meryl hap2.meryl child.meryl Note: hap1.meryl and hap2.meryl were generated from assembly contigs.
2) Build hap-mer dbs for trios: $MERQURY/trio/hapmers.sh hap1.meryl hap2.meryl child.meryl Output:
mat.only.meryl
and pat.only.meryl
mat.inherited.meryl
and pat.inherited.meryl
mat.hapmers.meryl
and pat.hapmers.meryl
Used * inherited hap-mer dbs (which will be used for evaluation): mat.hapmers.meryl
and pat.hapmers.meryl
for hap mer dbs for trios:
3) Hap mer dbs for trios: $MERQURY/trio/hap_blob.sh hap1.hapmer.meryl hap2.hapmer.meryl hap1.fasta hap2.fasta HAPMERS_ONLY
Thanks
Hello @B10inform ,
The kmers should never be generated from the assembly. This will only generate a self-circular evaluation.
It's always a best practice to generate the child's reads.meryl from the sequenced reads of the same individual, preferably from Illumina WGS. If no Illumina reads are available, then HiFi. The parental read k-mers are the same and should be obtained from WGS reads of each parent.
Thanks, Arang
Closing this issue, feel free to re-open if you need further assistance.
Hi Arang,
I assemble genome with Hifiasm using Hifi and HiC reads. It gives me a phased assembly.
1) I have no child meryl dbs so made child meryl dbs as follow: meryl union-sum hap1.meryl hap2.meryl child.meryl Note: hap1.meryl and hap2.meryl were generated from HiFi reads with information from the hap1 and hap2 gfa files. Using the read heads from the .gfa (phased assembly) hap1 and hap2 extracted fastq reads from the HiFi files.
2) Build hap-mer dbs for trios: $MERQURY/trio/hapmers.sh hap1.meryl hap2.meryl child.meryl Output:
parental specific dbs: mat.only.meryl and pat.only.meryl
inherited dbs: mat.inherited.meryl and pat.inherited.meryl
inherited hap-mer dbs (which will be used for evaluation): mat.hapmers.meryl and pat.hapmers.meryl
inherited_hapmers.png: k-mer distribution of the inherited dbs and cutoffs used to generate hap-mer dbs
Used * inherited hap-mer dbs (which will be used for evaluation): mat.hapmers.meryl and pat.hapmers.meryl for hap mer dbs for trios:
3) Hap mer dbs for trios: $MERQURY/trio/hap_blob.sh hap1.hapmer.meryl hap2.hapmer.meryl hap1.fasta hap2.fasta HAPMERS_ONLY
I get this now, it looks strange to me. What do you think about this, is it normal??
Thanks
Hi
We have completed the assembly of two haplotype assemblies with HiFi reads, the QV and completeness of them is good. We think switch error rate is a very important indicator for the evaluation of genome quality, so we calculated the switch error rate, but the result showed that the switch error rate is 46.3593%. We don't know what happen.
Run the command as follows :
meryl count k=19 HiFi_reads.fa output pacbio.meryl
merqury.sh pacbio.meryl/ hap1.fasta hap2.fasta out_prefix
cat hap1.fasta hap2.fasta > asm.fasta
phase_block.sh asm hap1.meryl hap2.meryl out
The output figure as follows:
Thank you for your help
Best jinxin