Closed arq5x closed 12 years ago
Agreed, perhaps replace the sample dictionary with a proper object to allow method calls on it? Or do we prefer just a dictionary?
On Thu, Jan 12, 2012 at 3:53 PM, Aaron Quinlan < reply@reply.github.com
wrote:
Currently, we can do the following:
for sample in record.samples: ... print sample['GT'] '1|2' '2|1' '2/2'
It would be nice to have a built in method that looks at the ref and alt alleles and converts the encoded genotypes into DNA alleles (GTS == genotypes using Sequence).
for sample in record.samples: ... print sample['GTS'] 'A|C' 'C|A' 'C/C'
Also, an option that returns the standard numeric encoding for genotypes: 0 == hom_ref, het == 1, hom_alt == 2, unknown (./.) == -1
for sample in record.samples: ... print sample['GTN'] 1 2 0 -1 etc.
Reply to this email directly or view it on GitHub: https://github.com/jamescasbon/PyVCF/issues/2
James Casbon
Population Genetics - http://www.populationgenetics.com/ james.casbon@populationgenetics.com +44 (0)1223 497353
Hi James,
Yeah, the idea of a samples object makes the most sense to me. The default behavior could just mimic the current functionality, but specific methods could be created to return a dict or list of tuples for the scenarios above.
So are you the "official" maintainer of this library now?
On Mon, Jan 16, 2012 at 1:36 AM, Aaron Quinlan < reply@reply.github.com
wrote:
Hi James,
Yeah, the idea of a samples object makes the most sense to me. The default behavior could just mimic the current functionality, but specific methods could be created to return a dict or list of tuples for the scenarios above.
So are you the "official" maintainer of this library now?
I'm officially the only person to have responded to original authors posting of a license and saying he would rather someone forked it, which I did. If someone else would prefer to that would be nice. Brad had suggested putting it into biopython.
Oops, wrong issue number in commit. Didn't mean to close, but it appears this cannot be reopened!
I created a branch in which I added a sample object, see issue-2-sample-objects
Perhaps you can add your method there?
Thanks @jamescasbon , this looks good. I am swamped for the next few days, but I have some existing functions for this in a project I am working on and can make a first pass at this early next week.
On Wed, Feb 8, 2012 at 4:12 PM, Brent Pedersen bpederse@gmail.com wrote:
Not sure if you just forgot to hit reply-all...
Yes, I did! I have the default reply-all on my home gmail but not work email. Second time today I've done this.
Yeah, that sounded more critical than I intended. Maybe you could get some traction on google by answering this question: http://stackoverflow.com/questions/433331/python-library-to-generate-vcf-files
I'm not unhappy with the API, but I it couldn't help to do like pull requests and have discussion for major new features. I'd do my best to participate.
On Wed, Feb 8, 2012 at 8:03 AM, James Casbon james.casbon@populationgenetics.com wrote:
On Wed, Feb 8, 2012 at 2:56 PM, Brent Pedersen bpederse@gmail.com wrote:
My 0.01 (not related to publication) I think it could use a bit more group review on commits going in. I haven't followed everything, but it seems like it may be fast heading for a complex library. I just updated and used it yesterday and figured the new stuff out by tabbing in ipython.
Rightly or wrongly, I've tried to be responsive to changes. This means they go in without any review.
The question is: what is the best way to get these reviews. Maybe it's (another bloody) mailing list.
They seem like good features, but maybe the naming of properties could use a bit more insight.
Suggestions accepted. I was considering adding something to the classes to encapsulate derived properties to avoid overcrowding the namespace of the Record/Call.
For publication, I think you'd need some killer filtering scripts to justify..
What publications do this kind of app note and how much do they charge? I'd currently trade the publication for some google juice as we currently come up on page 2.
James Casbon
Population Genetics - http://www.populationgenetics.com/ james.casbon@populationgenetics.com +44 (0)1223 497353
Currently, we can do the following:
It would be nice to have a built in method that looks at the ref and alt alleles and converts the encoded genotypes into DNA alleles (GTS == genotypes using Sequence).
Also, an option that returns the standard numeric encoding for genotypes: 0 == hom_ref, het == 1, hom_alt == 2, unknown (./.) == -1. This would allow one to easily compute useful popgen statistics such as HWE, pi_hat, and conduct multi-dimensional scaling comparisons.