Closed olechnwin closed 2 years ago
Hello @olechnwin !
The read-only plot is not generated by default. The equivalent output of Merqury is the .fill.png version of the spectra-cn plot if you are interested in quickly seeing the entire kmer frequency distribution.
The plot in the paper figure can be made using $MERQURY/plot/plot_spectra_cn.R
.
I made the custom plotting a little easier by making the histograms with a desired name.
Try this:
$MERQURY/plot/to_hist_for_plotting.sh reads.meryl read-only > reads.hist
Rscript $MERQURY/plot/plot_spectra_cn.R -f reads.hist -o test -t fill
The XY axis max value to plot can be adjusted with -m
and -n
.
For more options / help messages, try
$MERQURY/plot/to_hist_for_plotting.sh
Rscript $MERQURY/plot/plot_spectra_cn.R --help
@arangrhie, Thank you so much for your super quick reply and for the code. Super helpful! I thought from reading merqury paper you need to see this plot so you can figure out the kmers that are 1-copy and 2-copy and make sure the same kmers are in spectra-cn plot. But, maybe not?
Thanks for reading the paper very carefully :)
Yes, it is always recommended at best to first check the kmer spectrum of the reads. I usually do this with GenomeScope2, once the kmer db is built.
meryl histogram reads.meryl > reads_gs.hist
and use the histogram for genomescope2.0/genomescope.R
.
I haven't put much effort to generate the reads-only plot into Merqury, as GenomeScope was generating these plots for me + other informative stats (estimated genome size, heterozygosity, and repeats etc.).
Thank you so much!! Really appreciate your detailed replies and help. You made it easy to use your tools :)
I hope you don't mind me posting this question here. I was able to get the reads-only kmers, but only see 1 peak instead of 2 peaks that I expected since this is a diploid human cancer cell line. Here is the spectra-cn generated for the two haplotypes assembled genome: I am not sure what to make of this. The haplotype specific kmers seem to be much lower than expected. Is this because I am using HiFi reads which I know you recommended to use short reads instead?
Human genomes are relatively highly homozygous. The NA12878 example (Fig. 3d) in Merqury paper looks similar to yours. It is just a bit more difficult to see the 1-copy peak in your plot because of the lower coverage, because the erroneous k-mer portion bleeds into the true 1-copy peak. You could compare your GenomeScope heterozygosity estimates to my NA12878 set, which was estimated at 0.12%.
The assembly spectra-cn plot looks good to me. The result looks actually bit surprising, I guess your cancer cell line is relatively stable with little anuploidy?
It is best to use Illumina reads for evaluation, or using a hybrid approach, as the hifi specific errors won't get captured when evaluating hifi assemblies with hifi k-mers. If I don't have them, I'd just go with what I have.
@arangrhie,
Sorry for my late reply. I somehow didn't press "comment" to submit this. Thank you so much for your insights and suggestions.
Yes. My cancer cell line is known to be mostly diploid, relatively quiet with few mutations. Why did you say the result a bit surprising, what things in the plot that indicate this?
I'll use Illumina reads for evaluation. Many thanks!
I was expecting some more peaks coming from possible re-arrangements and anuploidy. Seems like it's not the case for your cell line. Wish all the best for your research!
Hello, Thanks for creating this very useful tool. I was expecting to get the spectra-cn reads only plot like figure 1a in your paper but I can't find it in my output folder. For your convenience, I'm pasting the plot from the paper below so you know which plot I'm referring to.
I ran meryl and merqury using the following code:
How do I generate the spectra-cn reads only plot? If I do
$MERQURY/merqury.sh -h
I got 2020-01-29. Thank you in advance for your help. edit: forgot to mention I have the spectra-cn for each assembly and for the combined assembly but not the reads only plot.