schneebergerlab / findGSE

findGSE is a tool for estimating size of (heterozygous diploid or homozygous) genomes by fitting k-mer frequencies iteratively with a skew normal distribution model.
34 stars 10 forks source link

How to interpretate results? #11

Open xiekunwhy opened 8 months ago

xiekunwhy commented 8 months ago

Hi,

for diploid genome, findGSE(histo="Cfl.histo", sizek=21, outdir="hom_test_21mer62", exp_hom = 62), which following number is haploid genome size?

size_all 2831278311 size_exl 2762932750 size_cat 3063218222 size_fit 2276505453 size_cor2 4239285369 Het_rate 0.00913753 0.00913753 Est. ratio of repeats 0.88225222 Final k-mer cov 36.5624931

Best, Kun

HeQSun commented 8 months ago

Hi,

for diploid genome, findGSE(histo="Cfl.histo", sizek=21, outdir="hom_test_21mer62", exp_hom = 62), which following number is haploid genome size?

size_all 2831278311 size_exl 2762932750 size_cat 3063218222 size_fit 2276505453 size_cor2 4239285369 Het_rate 0.00913753 0.00913753 Est. ratio of repeats 0.88225222 Final k-mer cov 36.5624931

Best, Kun

Hi, can you share the pdf?

xiekunwhy commented 8 months ago

here is the pdf file v1.94.est.Cfl.histo.sizek21.curvefitted.pdf

HeQSun commented 8 months ago

here is the pdf file v1.94.est.Cfl.histo.sizek21.curvefitted.pdf

The current result seemed not correct. Can you reset exp_hom = 70, and rerun?

You can also share me the histo file, if that is okay.

xiekunwhy commented 8 months ago

Thank you for your reply.

Here is exp_hom = 70 results, v1.94.est.Cfl.histo.sizek21.curvefitted.pdf

and the histo file is here, Cfl.zip

Best, Kun

HeQSun commented 8 months ago

Thank you for your reply.

Here is exp_hom = 70 results, v1.94.est.Cfl.histo.sizek21.curvefitted.pdf

and the histo file is here, Cfl.zip

Best, Kun

The histogram look a bit "weird".

Do you know if the species is diploid or polyploid? I am asking because the hist has a peak at 15x, and another at 56x, and the tail of the hist is also with high y-values - repeats or resulting from higher ploidy.

To me, it does not look like a diploid, but more likely a tetraploid.

I can only tell the full genome size is around 10 Gb. The haploid genome size would be 10 Gb / n, where n is the ploidy which you need to figure out.

cfl_raw.pdf

Another explanation could be, this is mixture of different DNA material - maybe there is contamination in DNA in sequencing.

xiekunwhy commented 8 months ago

Thank you for your help.

There are two state of this species, diploid and tetraploid.

Smudgeplot and karyotype analysis told me that the sample we are analysis is diploid. May be there is contamination in DNA in sequencing.

Here is Smudgeplot results smudgeplot_smudgeplot smudgeplot_smudgeplot_log10 smudgeplot_verbose_summary.txt

Best, Kun

simleopold commented 8 months ago

Hi,

I also wanted to know how to interpret the results and which number is the "real" genome size. Here is the pdf file. findGSE-PSR.pdf

Thank you for your help.

HeQSun commented 8 months ago

Thank you for your help.

There are two state of this species, diploid and tetraploid.

Smudgeplot and karyotype analysis told me that the sample we are analysis is diploid. May be there is contamination in DNA in sequencing.

Here is Smudgeplot results smudgeplot_smudgeplot smudgeplot_smudgeplot_log10 smudgeplot_verbose_summary.txt

Best, Kun

I would not believe in k-mer estimation in ploidy, in this particular case, because the peak at 15x has been considered as errors - I do not know what method is underlying this determination.

You can try

  1. using wet-lab method to check the genome size again
  2. blast some of the k-mers at peak 15x, to check if there is chance to figure out which species the k-mers are from.
HeQSun commented 8 months ago

Hi,

I also wanted to know how to interpret the results and which number is the "real" genome size. Here is the pdf file. findGSE-PSR.pdf

Thank you for your help.

This is a homozygous species, you do not need to sep up exp_hom. The last row gives the haploid genome size.

simleopold commented 6 months ago

Hi,

I ran this command on a genome which I don't know the size and the ploidy level : findGSE(histo = "/Users/icesim/Downloads/21mer_no_cut-2.histo", sizek=21, outdir="/Users/icesim/Desktop/findGSE-teleau", exp_hom = 100) The result expected was around 8mb so does findGSE gives an estimation for the whole genome size or the haploid genome size ?

Thanks for your help, findGSE.pdf

HeQSun commented 6 months ago

Hi,

it gives haploid genome size estimation.

According to the k-mer coverage pattern, you may want to run it under homozygous mode.

Best, Hequan

On 16. Mar 2024, at 18:14, simleopold @.***> wrote:

Hi,

I ran this command on a genome which I don't know the size and the ploidy level : findGSE(histo = "/Users/icesim/Downloads/21mer_no_cut-2.histo", sizek=21, outdir="/Users/icesim/Desktop/findGSE-teleau", exp_hom = 100) The result expected was around 8mb so does findGSE gives an estimation for the whole genome size or the haploid genome size ?

Thanks for your help, findGSE.pdf https://github.com/schneebergerlab/findGSE/files/14623185/findGSE.pdf — Reply to this email directly, view it on GitHub https://github.com/schneebergerlab/findGSE/issues/11#issuecomment-2001938671, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQGWWRL4RJBS7IDW474TLLYYQLQFAVCNFSM6AAAAABCOJQVRGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBRHEZTQNRXGE. You are receiving this because you commented.

simleopold commented 6 months ago

Thank you for your quick answer,

I tried to run it under homozygous mode but I have the following error on R : "Error in singlestart:singleend : NA/NaN argument"

Does it mean I have no choice but to run it under heterozygous mode ?

HeQSun commented 6 months ago

Can you show me the cmd?

On 16. Mar 2024, at 19:03, simleopold @.***> wrote:

Thank you for your quick answer,

I tried to run it under homozygous mode but I have the following error on R : "Error in singlestart:singleend : NA/NaN argument"

Does it mean I have no choice but to run it under heterozygous mode ?

— Reply to this email directly, view it on GitHub https://github.com/schneebergerlab/findGSE/issues/11#issuecomment-2001949841, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQGWWRYTF45JQ2D5XRETR3YYQRHFAVCNFSM6AAAAABCOJQVRGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBRHE2DSOBUGE. You are receiving this because you commented.

simleopold commented 6 months ago

I ran this command : findGSE(histo = "/Users/icesim/Downloads/21mer_no_cut-2.histo", sizek=21, outdir="/Users/icesim/Desktop/findGSE-teleau")