marbl / merqury

k-mer based assembly evaluation
Other
272 stars 19 forks source link

Odd Spectra.asm plot generated #102

Closed ap1438 closed 1 year ago

ap1438 commented 1 year ago

Hi,

I ran merquery for assembly evaluation in trio mode and got this weird plot. I cannot understand what's wrong in the assembly.

image

I ran the same command for other 2 assemblies and the plots were perfect as shown in the example.

arangrhie commented 1 year ago

Hi, Try re-generating the plots with -m -n options. More in: https://github.com/marbl/merqury/issues/27.

ap1438 commented 1 year ago

Thankyou for quick reply.

I generated the new plot. But it looks odd. image

You can see there are peaks of both the assemblies from 0 -50 in the x axis with high frequency low coverage peaks (As far as my understanding these are sequencing errors). which is again not usual what I have seen in other two assemblies. Can you please explain what this means. Does it mean contamination or sequencing error, or any other problem is there with the assemblies.

arangrhie commented 1 year ago

It looks like some junk sequence that got into your assembly. You could try fish out where the low-frequency kmers are present along your two assemblies. For example,

meryl less-than 25 read-db.meryl output lowcov.meryl
meryl intersect lowcov.meryl asm1.meryl output asm1_lowcov.meryl
# in bed format
meryl-lookup -bed -sequence asm1.fasta -mers asm1_lowcov.meryl | bedtools merge -i - > asm1_lowcov.bed
# in wig format
meryl-lookup -wig-count asm1_lowcov.wig

And apply the same on asm2. The .wig file will have values of the multiplicity seen in the low_cov.meryl. See where most of the low-coverage kmers are present. Fish out some contigs with multiple hits, and try a BLAST search or so. It will show up if it was a contaminant.

ap1438 commented 1 year ago

Thank you for your suggestion. I was trying to change the y axis for this plot but it seems that i am not able to change the y axis with -n parameter.

check_hifi_33_asm fl

plot_spectra_cn.R -m 250 -n 500000 -f hifi_S33.spectra-asm.hist -o check_hifi_33_asm -z hifi_S33.dist_only.hist Loading required package: argparse Loading required package: ggplot2 Loading required package: scales [1] "x_max: 250" [1] "y_max: 92475499.6" [1] "## Line graph" [1] "## Area under the curve filled" [1] "## Stacked" Warning messages: 1: The size argument of element_rect() is deprecated as of ggplot2 3.4.0. ℹ Please use the linewidth argument instead. 2: Using size aesthetic for lines was deprecated in ggplot2 3.4.0. ℹ Please use linewidth instead.

I get this . Am i doing someting wrong with the command.

arangrhie commented 1 year ago

That is because Merqury tries to show the error (dist_only.hist) results by default. If you aren't interested in it, try generate one without the -z hifi_S33.dist_only.hist.

Rscript  $MERQURY/plot/plot_spectra_cn.R -m 250 -n 500000 -f hifi_S33.spectra-asm.hist -o check_hifi_33_asm

Best, Arang