Closed Caoyu819 closed 3 years ago
Hi 曹昱,
thanks for your interests in using findGSE.
--
I think it is risky to filter k-mers purely based on k-mer coverage, because the genome itself might be highly repetitive. For instance, the centromeric repeat can occur in millions of copies, and thus the respective k-mers. If such repetitive k-mers are filtered out, you would see an underestimated genome size.
I have some questions regarding your results:
And, all tools would be affected by pcf amplifications/organelle sequencing, because such unexpected info would influence the shape of k-mer freq distribution (and thus the average k-mer coverage) and the total number of genomic k-mers, which are the two key parameter for determining genome size.
You can test and try the attached filtering pipeline for filtering non-genomic reads.
Best, Hequan
Hi~ Dr. Sun I have estimated genome size of an arbor tree, Platycarya strobilacea, using findGSE and another software, genomeScope 2.0. But when I try to compare the result of two softwares, I found findGSE always giving a much larger value than genomeScope 2.0 (please see attached file for detail). I am puzzled and hope you can help.
The input files used in both software are the same cleandata which are obtained after removing adapters. The histo file of kmer depth is calculated by kmc through the following two commond lines (set kmer=21 for example):
So I wonder if the findGSE is sensitive to reads which are duplicated by PCR amplification and organelle sequencing, should I filter the duplicated reads in the input cleandata before kmer depth counting? Finally, may I refer to your detailed process of filtering artificial reads?I’ll be very grateful to you if it is possible.
Thank you for reading the question and I’m looking forward to your reply.
Best wishes for you~
You can also contact me via the email (caoyuchn@yeah.net), thanks again~
Yu Cao Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Science, Beijing Normal University, China![compare_findGSE-genomeScope](https://user-images.githubusercontent.com/63846805/107110119-566cb700-6880-11eb-9842-2fad650ba243.png)