jamescasbon / PyVCF

A Variant Call Format reader for Python.
http://pyvcf.readthedocs.org/en/latest/index.html
Other
402 stars 200 forks source link

Options for reporting genotypes. #2

Closed arq5x closed 12 years ago

arq5x commented 12 years ago

Currently, we can do the following:

>>> for sample in record.samples:
...     print sample['GT']
'1|2'
'2|1'
'2/2'

It would be nice to have a built in method that looks at the ref and alt alleles and converts the encoded genotypes into DNA alleles (GTS == genotypes using Sequence).

>>> for sample in record.samples:
...     print sample['GTS']
'A|C'
'C|A'
'C/C'

Also, an option that returns the standard numeric encoding for genotypes: 0 == hom_ref, het == 1, hom_alt == 2, unknown (./.) == -1. This would allow one to easily compute useful popgen statistics such as HWE, pi_hat, and conduct multi-dimensional scaling comparisons.

>>> for sample in record.samples:
...     print sample['GTN']
1
2
0 
-1
etc.
jamescasbon commented 12 years ago

Agreed, perhaps replace the sample dictionary with a proper object to allow method calls on it? Or do we prefer just a dictionary?

On Thu, Jan 12, 2012 at 3:53 PM, Aaron Quinlan < reply@reply.github.com

wrote:

Currently, we can do the following:

for sample in record.samples: ... print sample['GT'] '1|2' '2|1' '2/2'

It would be nice to have a built in method that looks at the ref and alt alleles and converts the encoded genotypes into DNA alleles (GTS == genotypes using Sequence).

for sample in record.samples: ... print sample['GTS'] 'A|C' 'C|A' 'C/C'

Also, an option that returns the standard numeric encoding for genotypes: 0 == hom_ref, het == 1, hom_alt == 2, unknown (./.) == -1

for sample in record.samples: ... print sample['GTN'] 1 2 0 -1 etc.


Reply to this email directly or view it on GitHub: https://github.com/jamescasbon/PyVCF/issues/2

James Casbon

Population Genetics - http://www.populationgenetics.com/ james.casbon@populationgenetics.com +44 (0)1223 497353

arq5x commented 12 years ago

Hi James,

Yeah, the idea of a samples object makes the most sense to me. The default behavior could just mimic the current functionality, but specific methods could be created to return a dict or list of tuples for the scenarios above.

So are you the "official" maintainer of this library now?

jamescasbon commented 12 years ago

On Mon, Jan 16, 2012 at 1:36 AM, Aaron Quinlan < reply@reply.github.com

wrote:

Hi James,

Yeah, the idea of a samples object makes the most sense to me. The default behavior could just mimic the current functionality, but specific methods could be created to return a dict or list of tuples for the scenarios above.

So are you the "official" maintainer of this library now?

I'm officially the only person to have responded to original authors posting of a license and saying he would rather someone forked it, which I did. If someone else would prefer to that would be nice. Brad had suggested putting it into biopython.

jamescasbon commented 12 years ago

Oops, wrong issue number in commit. Didn't mean to close, but it appears this cannot be reopened!

jamescasbon commented 12 years ago

I created a branch in which I added a sample object, see issue-2-sample-objects

Perhaps you can add your method there?

arq5x commented 12 years ago

Thanks @jamescasbon , this looks good. I am swamped for the next few days, but I have some existing functions for this in a project I am working on and can make a first pass at this early next week.

jamescasbon commented 12 years ago

On Wed, Feb 8, 2012 at 4:12 PM, Brent Pedersen bpederse@gmail.com wrote:

Not sure if you just forgot to hit reply-all...

Yes, I did! I have the default reply-all on my home gmail but not work email. Second time today I've done this.

Yeah, that sounded more critical than I intended. Maybe you could get some traction on google by answering this question: http://stackoverflow.com/questions/433331/python-library-to-generate-vcf-files

I'm not unhappy with the API, but I it couldn't help to do like pull requests and have discussion for major new features. I'd do my best to participate.

On Wed, Feb 8, 2012 at 8:03 AM, James Casbon james.casbon@populationgenetics.com wrote:

On Wed, Feb 8, 2012 at 2:56 PM, Brent Pedersen bpederse@gmail.com wrote:

My 0.01 (not related to publication) I think it could use a bit more group review on commits going in. I haven't followed everything, but it seems like it may be fast heading for a complex library. I just updated and used it yesterday and figured the new stuff out by tabbing in ipython.

Rightly or wrongly, I've tried to be responsive to changes.  This means they go in without any review.

The question is:  what is the best way to get these reviews.  Maybe it's (another bloody) mailing list.

They seem like good features, but maybe the naming of properties could use a bit more insight.

Suggestions accepted.  I was considering adding something to the classes to encapsulate derived properties to avoid overcrowding the namespace of the Record/Call.

For publication, I think you'd need some killer filtering scripts to justify..

What publications do this kind of app note and how much do they charge?  I'd currently trade the publication for some google juice as we currently come up on page 2.

James Casbon

Population Genetics - http://www.populationgenetics.com/ james.casbon@populationgenetics.com +44 (0)1223 497353