Closed tavinathanson closed 7 years ago
@jburos added some more stuff, feel free to review at this point.
Nice - thanks @tavinathanson ! LGTM. I like the use of strip_column_name. If the conflict happens a lot we may want to catch the error & provide a more descriptive message in the init function for either Patient or Sample. But for now it's probably good as-is.
@jburos I had to fix a few failing tests due to e.g. os
existing in additional_data
as well as in Patient
already. I'm fixing by removing that value from the additional_data
dictionary, which actually seems correct based on the meaning of "additional data". But we'll definitely hit this error in our other cohorts. A simple replacement of e.g. id=row["id"]
with id=row.pop("id")
should fix things for all problematic columns in RCC/bladder/etc.
Moving some things from my
data.py
toCohorts
.filter_fn
watermark: fixes https://github.com/hammerlab/cohorts/issues/183.Also in this PR:
Patient
object attributes based onadditional_data
: fixes https://github.com/hammerlab/cohorts/issues/190.Sample
.