monarch-initiative / pyphetools

Python Phenopacket Tools
https://monarch-initiative.github.io/pyphetools/
MIT License
10 stars 1 forks source link

Genotype extension #19

Closed lnrekerle closed 1 year ago

lnrekerle commented 1 year ago

An additional column can now be added to VariantColumnMapper that will be used to identify each variant as Homo/Hetero/Hemizygous.

pnrobinson commented 1 year ago

@lnrekerle -- looks great! But could we do this In CohortEncoder and the Case Encoder, the constructor gets a pandas dataframe. At some point we have this

self._df = df

can we change this to

self._df = df.astype(str)

This turns all columns into strings, and simplifies the downstream code (it does mean that we will also need to convert strings into ints for columns that provide age as integers, and we will probably need to change that. The advantage is that we know what datatypes to expect, otherwise, sometimes pandas changes things in unexpected ways and we get strange errors.

After this I think we can merge both PRs!

lnrekerle commented 1 year ago

I couldn't find "self._df = df" in the case encoder, just the cohort encoder. I did run a test on the same data that was having issues with the sex column and it worked with just the cohort encoder change.

Let me know if I am missing it in the case encoder, otherwise this should be good to go!