marbl / merqury

k-mer based assembly evaluation
Other
272 stars 19 forks source link

The switch error rate is high #61

Closed jinxin112233 closed 2 years ago

jinxin112233 commented 2 years ago

Hi

We have completed the assembly of two haplotype assemblies with HiFi reads, the QV and completeness of them is good. We think switch error rate is a very important indicator for the evaluation of genome quality, so we calculated the switch error rate, but the result showed that the switch error rate is 46.3593%. We don't know what happen.

Run the command as follows : meryl count k=19 HiFi_reads.fa output pacbio.meryl merqury.sh pacbio.meryl/ hap1.fasta hap2.fasta out_prefix cat hap1.fasta hap2.fasta > asm.fasta phase_block.sh asm hap1.meryl hap2.meryl out

The output figure as follows: fig

Thank you for your help

Best jinxin

arangrhie commented 2 years ago

Hello @jinxin112233,

The trio-based evaluation expects haplotype specific kmers to be provided, and treats them as the 'truth'. These haplotype specific kmers should be obtained from parental kmers, using this script. Use mat.hapmers.meryl and pat.hapmers.meryl for your evaluation.

Then, I'd recommend to run the trio-mode Merqury freshly under a new directory.

No need to concatenate hap1 and hap2 fasta files!

B10inform commented 2 years ago

Hi Arang Rhie, I used following steps as you recommended. My out put plot looks as below. It seems there are few hapmers (circles). Does this output look fine??

image

1) I have no child meryl dbs so made child meryl dbs as follow: meryl union-sum hap1.meryl hap2.meryl child.meryl Note: hap1.meryl and hap2.meryl were generated from assembly contigs.

2) Build hap-mer dbs for trios: $MERQURY/trio/hapmers.sh hap1.meryl hap2.meryl child.meryl Output:

Used * inherited hap-mer dbs (which will be used for evaluation): mat.hapmers.meryl and pat.hapmers.meryl for hap mer dbs for trios:

3) Hap mer dbs for trios: $MERQURY/trio/hap_blob.sh hap1.hapmer.meryl hap2.hapmer.meryl hap1.fasta hap2.fasta HAPMERS_ONLY

Thanks

arangrhie commented 2 years ago

Hello @B10inform ,

The kmers should never be generated from the assembly. This will only generate a self-circular evaluation.

It's always a best practice to generate the child's reads.meryl from the sequenced reads of the same individual, preferably from Illumina WGS. If no Illumina reads are available, then HiFi. The parental read k-mers are the same and should be obtained from WGS reads of each parent.

Thanks, Arang

arangrhie commented 2 years ago

Closing this issue, feel free to re-open if you need further assistance.

B10inform commented 2 years ago

Hi Arang,

I assemble genome with Hifiasm using Hifi and HiC reads. It gives me a phased assembly.

1) I have no child meryl dbs so made child meryl dbs as follow: meryl union-sum hap1.meryl hap2.meryl child.meryl Note: hap1.meryl and hap2.meryl were generated from HiFi reads with information from the hap1 and hap2 gfa files. Using the read heads from the .gfa (phased assembly) hap1 and hap2 extracted fastq reads from the HiFi files.

2) Build hap-mer dbs for trios: $MERQURY/trio/hapmers.sh hap1.meryl hap2.meryl child.meryl Output:

parental specific dbs: mat.only.meryl and pat.only.meryl
inherited dbs: mat.inherited.meryl and pat.inherited.meryl
inherited hap-mer dbs (which will be used for evaluation): mat.hapmers.meryl and pat.hapmers.meryl
inherited_hapmers.png: k-mer distribution of the inherited dbs and cutoffs used to generate hap-mer dbs

Used * inherited hap-mer dbs (which will be used for evaluation): mat.hapmers.meryl and pat.hapmers.meryl for hap mer dbs for trios:

3) Hap mer dbs for trios: $MERQURY/trio/hap_blob.sh hap1.hapmer.meryl hap2.hapmer.meryl hap1.fasta hap2.fasta HAPMERS_ONLY

I get this now, it looks strange to me. What do you think about this, is it normal??

image

Thanks