pepkit / peppy

Project metadata manager for PEPs in Python
https://pep.databio.org/peppy
BSD 2-Clause "Simplified" License
37 stars 13 forks source link

Activating amendment overwrites the full original sample configuration #431

Closed Redmar-van-den-Berg closed 1 year ago

Redmar-van-den-Berg commented 1 year ago

I'm trying to use amendments as described here, but it looks like the data from the amendment does not get amended to the original data at all, but straight out replaces it.

>>> import peppy
>>> P = peppy.Project('project_config.yml');print(P.samples[0])
Sample '1234' in Project (project_config.yml)

sample_name: 1234
R1:          /path/to/data/folder/{forward}

>>> P = peppy.Project('project_config.yml',amendments='disease');print(P.samples[0])
Sample '1234' in Project (project_config.yml)

sample_name: 1234
disease:     flu

Is this the intended behaviour? According to the documentation, amendments are intended to have slightly different project configurations, without having to duplicated all the data. But if the original sample table is simply discarded on loading the amendment, I still have to duplicated all relevant data across all project configurations.

project.zip

stolarczyk commented 1 year ago

I think you're seeing the expected behavior. In the attached project disease amendment specifies a new sample table that replaces the original one.

From the docs page you linked:

Practically what happens under the scenes is that the primary project is first loaded, and then, if an amendment is activated, it overrides any attributes with those specified in the amendment.

sample_table is the project attribute that is overriden in your case, so it is meant to be replaced upon amendment activation.

Redmar-van-den-Berg commented 1 year ago

That is too bad, I was planning on using amendments to optionally add additional columns to my samples, in an effort to keep the main sample table clean.

Is there another way to optionally update/overwrite sample information based on a .csv file in PEP?

nsheff commented 1 year ago

Is there another way to optionally update/overwrite sample information based on a .csv file in PEP?

Yes! This would be the role of the sample_modifiers. You could definitely have an amendment that specified a sample modifier, like an append modifier to add a new attribute. If you want to alter the value of an existing attribute, you could add in a remove modifier, and then an imply or append modifier with the same name, to re-create it.

nsheff commented 1 year ago

@Redmar-van-den-Berg were you able to get this working with sample_modifiers ?

Redmar-van-den-Berg commented 1 year ago

@nsheff after some re-thinking I was able to get most of the functionality I was looking for using amendments with sample_modifiers (hence the pull request on eido to add support for amendments). Thanks!