ybdong919 / npGeno

bioinformatics pipeline that could generate genome-wide SNP genotype data for a conventional genetic diversity analysis of a non-model plant species
1 stars 0 forks source link

what k-mer length to choose? #1

Open angelaparodymerino opened 5 years ago

angelaparodymerino commented 5 years ago

Hi,

I am a molecular biologist. Briefly, I am working on a non-model species, with no genomic data. I have some Genotyping-By-Sequencing data from different populations and I want to use npGeno to obtain variants (SNPs) between those populations.

Anyway, I am wondering why the Getting started with npGeno.pdf asks you to set a k-mer length to 100 (for Minia). According to what I read (i.e. https://github.com/rrwick/Bandage/wiki/Effect-of-kmer-size the "optimal" k-mer length depends on the length of your reads, the depth of coverage of the reads and the genome size. Since I don't have genome size information (although it is estimated to be 1.4Gb), I have also found in biostars forums that a good k-mer size to choose would be 2/3 of the reads size. In my case, reads are mostly 90-95bp, therefore I think I should choose a k-mer of ~61. Am I right?

Thanks in advance,

'Angela Parody Merino

ybdong919 commented 5 years ago

Yes, right. That is only a default setting for general situation. You could change it based on your own situation.

Yibo

On Wed, Jul 10, 2019, 5:09 PM Angela Parody-Merino notifications@github.com wrote:

Hi,

I am a molecular biologist. Briefly, I am working on a non-model species, with no genomic data. I have some Genotyping-By-Sequencing data from different populations and I want to use npGeno to obtain variants (SNPs) between those populations.

Anyway, I am wondering why the Getting started with npGeno.pdf asks you to set a k-mer length to 100 (for Minia). According to what I read (i.e. https://github.com/rrwick/Bandage/wiki/Effect-of-kmer-size http://url the "optimal" k-mer length depends on the length of your reads, the depth of coverage of the reads and the genome size. Since I don't have genome size information (although it is estimated to be 1.4Gb), I have also found in biostars forums that a good k-mer size to choose would be 2/3 of the reads size. In my case, reads are mostly 90-95bp, therefore I think I should choose a k-mer of ~61. Am I right?

Thanks in advance,

'Angela Parody Merino

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ybdong919/npGeno/issues/1?email_source=notifications&email_token=AD7MFAOH3IJKDUCMCLEBH63P6ZM3JA5CNFSM4H75G3OKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G6O6MMQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AD7MFAMPTLTGMR4DCYUTXQ3P6ZM3JANCNFSM4H75G3OA .

angelaparodymerino commented 5 years ago

Hi Yibo, thanks for your quick answer. Maybe it is a stupid question, but how could I change it?

Thanks in advance,

'Angela Parody Merino

ybdong919 commented 5 years ago

You have to read Perl script and revise script arguments. If you can not, you have to use default settings. Yibo

On Wed, Jul 10, 2019, 11:08 PM Angela Parody-Merino < notifications@github.com> wrote:

Hi Yibo, thanks for your quick answer. Maybe it is a stupid question, but how could I change it?

Thanks in advance,

'Angela Parody Merino

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ybdong919/npGeno/issues/1?email_source=notifications&email_token=AD7MFAKNI4YAUTV2GOALIVTP62W35A5CNFSM4H75G3OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZVOFDA#issuecomment-510321292, or mute the thread https://github.com/notifications/unsubscribe-auth/AD7MFANYJDSG4BLL2OTPJFDP62W35ANCNFSM4H75G3OA .

angelaparodymerino commented 5 years ago

Ok, I see. Last question, I am not completely sure: What do you mean by set (in the .csv file)? is each of the samples one set?

Thanks in advance,

'Angela

ybdong919 commented 5 years ago

Yes

On Wed, Jul 10, 2019, 11:27 PM Angela Parody-Merino < notifications@github.com> wrote:

Ok, I see. Last question, I am not completely sure: What do you mean by set (in the .csv file)? is each of the samples one set?

Thanks in advance,

'Angela

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ybdong919/npGeno/issues/1?email_source=notifications&email_token=AD7MFAOWTV7S7MLBXLKTSDTP62ZBLA5CNFSM4H75G3OKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZVO23A#issuecomment-510324076, or mute the thread https://github.com/notifications/unsubscribe-auth/AD7MFAOGWRCLMUVFELN7Y53P62ZBLANCNFSM4H75G3OA .