Add a median VAF filter function and a filter_fn watermark

hammerlab / cohorts

Utilities for analyzing mutations and neoepitopes in patient cohorts

Apache License 2.0

20 stars 4 forks source link

Add a median VAF filter function and a filter_fn watermark #189

Closed tavinathanson closed 7 years ago

tavinathanson commented 7 years ago

Moving some things from my data.py to Cohorts.

Add a function for median VAF.
Add a filter_fn watermark: fixes https://github.com/hammerlab/cohorts/issues/183.

Also in this PR:

Automatically populate Patient object attributes based on additional_data: fixes https://github.com/hammerlab/cohorts/issues/190.
Do the same with Sample.
Add some more functions: fixes https://github.com/hammerlab/cohorts/issues/191.

tavinathanson commented 7 years ago

@jburos added some more stuff, feel free to review at this point.

jburos commented 7 years ago

Nice - thanks @tavinathanson ! LGTM. I like the use of strip_column_name. If the conflict happens a lot we may want to catch the error & provide a more descriptive message in the init function for either Patient or Sample. But for now it's probably good as-is.

tavinathanson commented 7 years ago

@jburos I had to fix a few failing tests due to e.g. os existing in additional_data as well as in Patient already. I'm fixing by removing that value from the additional_data dictionary, which actually seems correct based on the meaning of "additional data". But we'll definitely hit this error in our other cohorts. A simple replacement of e.g. id=row["id"] with id=row.pop("id") should fix things for all problematic columns in RCC/bladder/etc.

coveralls commented 7 years ago

Coverage decreased (-0.04%) to 57.706% when pulling 44938b791cd5744a596081db0c45fabc17d70212 on minor_updates into 616cb5f2c27d34cd131a98ffbb556b2aa3dde9f7 on master.