stschiff / msmc-tools

Tools and Utilities for msmc and msmc2
46 stars 17 forks source link

Can we use multi sample vcf and then split it as per chromosome and individual #18

Closed anubhabkhan closed 5 years ago

anubhabkhan commented 6 years ago

Hi,

I already have a multi sample vcf. I filtered all the SNPs for genotype quality and base quality 30. Can I use this as an input for MSMC? I am splitting the vcf to generate several files per chromosome and per individual. Can I use these directly for generate_multihetsep step?

Thanks Anubhab

stschiff commented 6 years ago

I would in that case rather write your own little script that converts the VCF to a multihetsep file. The only issue is the masks. You need to assume something for the regions between the segregating sites. Assuming them all to be called homozygous reference may not be appropriate depending on your coverage.

Stephan

On 21 Dec 2017, at 05:08, anubhabkhan notifications@github.com wrote:

Hi,

I already have a multi sample vcf. I filtered all the SNPs for genotype quality and base quality 30. Can I use this as an input for MSMC? I am splitting the vcf to generate several files per chromosome and per individual. Can I use these directly for generate_multihetsep step?

Thanks Anubhab

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc-tools/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQmp4I3bEB5tNJaX5e4UUf-DQUl52Fks5tCdmygaJpZM4RJVL3.

anubhabkhan commented 6 years ago

I mostly have average sequencing depths of 24-30X depth. Is there a way to create just the mask files so that the process is quick?

With regards and yours sincerely

Anubhab Research Scholar, National Centre for Biological Sciences, Tata Institute of Fundamental Research, India

On 08-Jan-2018, at 2:36 PM, Stephan Schiffels notifications@github.com wrote:

I would in that case rather write your own little script that converts the VCF to a multihetsep file. The only issue is the masks. You need to assume something for the regions between the segregating sites. Assuming them all to be called homozygous reference may not be appropriate depending on your coverage.

Stephan

On 21 Dec 2017, at 05:08, anubhabkhan notifications@github.com wrote:

Hi,

I already have a multi sample vcf. I filtered all the SNPs for genotype quality and base quality 30. Can I use this as an input for MSMC? I am splitting the vcf to generate several files per chromosome and per individual. Can I use these directly for generate_multihetsep step?

Thanks Anubhab

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc-tools/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQmp4I3bEB5tNJaX5e4UUf-DQUl52Fks5tCdmygaJpZM4RJVL3.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc-tools/issues/18#issuecomment-355913025, or mute the thread https://github.com/notifications/unsubscribe-auth/AWn7MYaGDlLJnt-KQfGDGjOHFcdgzTj-ks5tIdp5gaJpZM4RJVL3.

stschiff commented 6 years ago

Yes, by using my bamCaller script on the bam files. But you could also generate masks using some different rule set, e.g. by saying: I include all sites at which 90% of individuals have a high quality genotype called, or something like that. But you would definitely need to go to the bam level. If you have a multi-sample VCF that only contains segregating sites, you have lost information on non-segregating sites.

Stephan

On 8 Jan 2018, at 11:35, anubhabkhan notifications@github.com wrote:

I mostly have average sequencing depths of 24-30X depth. Is there a way to create just the mask files so that the process is quick?

With regards and yours sincerely

Anubhab Research Scholar, National Centre for Biological Sciences, Tata Institute of Fundamental Research, India

On 08-Jan-2018, at 2:36 PM, Stephan Schiffels notifications@github.com wrote:

I would in that case rather write your own little script that converts the VCF to a multihetsep file. The only issue is the masks. You need to assume something for the regions between the segregating sites. Assuming them all to be called homozygous reference may not be appropriate depending on your coverage.

Stephan

On 21 Dec 2017, at 05:08, anubhabkhan notifications@github.com wrote:

Hi,

I already have a multi sample vcf. I filtered all the SNPs for genotype quality and base quality 30. Can I use this as an input for MSMC? I am splitting the vcf to generate several files per chromosome and per individual. Can I use these directly for generate_multihetsep step?

Thanks Anubhab

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc-tools/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQmp4I3bEB5tNJaX5e4UUf-DQUl52Fks5tCdmygaJpZM4RJVL3.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc-tools/issues/18#issuecomment-355913025, or mute the thread https://github.com/notifications/unsubscribe-auth/AWn7MYaGDlLJnt-KQfGDGjOHFcdgzTj-ks5tIdp5gaJpZM4RJVL3.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc-tools/issues/18#issuecomment-355931599, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQmpfVCM6I5ft5X6GYjtqZgUw5UgoSks5tIe90gaJpZM4RJVL3.