zavolanlab / zarp-cli

A user-friendly command-line interface for the ZARP RNA-seq analysis workflow
https://zavolanlab.github.io/zarp-cli/
Apache License 2.0
5 stars 1 forks source link

feat: set sample config defaults #54

Closed uniqueg closed 1 year ago

uniqueg commented 1 year ago

Fixes #53

Type of change

Please delete options that are not relevant.

Checklist

Please carefully read these items and tick them off if the statements are true or do not apply.

If for some reason you are unable to tick off all boxes, please leave a comment explaining the issue you are facing so that we can work on it together.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (61f072d) 100.00% compared to head (3b961ab) 100.00%.

:exclamation: Current head 3b961ab differs from pull request most recent head 8f5cd1a. Consider uploading reports for the commit 8f5cd1a to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #54 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 21 21 Lines 1026 1049 +23 ========================================= + Hits 1026 1049 +23 ``` | [Impacted Files](https://app.codecov.io/gh/zavolanlab/zarp-cli/pull/54?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zavolanlab) | Coverage Δ | | |---|---|---| | [zarp/plugins/sample\_processors/defaults.py](https://app.codecov.io/gh/zavolanlab/zarp-cli/pull/54?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zavolanlab#diff-emFycC9wbHVnaW5zL3NhbXBsZV9wcm9jZXNzb3JzL2RlZmF1bHRzLnB5) | `100.00% <100.00%> (ø)` | | | [zarp/samples/sample\_record\_processor.py](https://app.codecov.io/gh/zavolanlab/zarp-cli/pull/54?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zavolanlab#diff-emFycC9zYW1wbGVzL3NhbXBsZV9yZWNvcmRfcHJvY2Vzc29yLnB5) | `100.00% <100.00%> (ø)` | | | [zarp/zarp.py](https://app.codecov.io/gh/zavolanlab/zarp-cli/pull/54?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=zavolanlab#diff-emFycC96YXJwLnB5) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

uniqueg commented 1 year ago

Considering what you said yesterday regarding the time point at which defaults should be set and what I actually described in #53, this PR addresses/should address the following issue:

Set sample-specific defaults based on CLI arguments and the default config in ~/.zarp/user.yaml and apply them to all samples where that information is not yet available.

It is possible that the information is already available because users can also submit ZARP sample tables as sample references, full or partial, with individual rows/samples being complete or incomplete. And wherever metadata was supplied via a sample table, it should always trump any defaults, including those set via the CLI. Hence, we still wanna call .update() with overwrite set to False.

But what I had actually in mind when wiring the processor in the last position in zarp.zarp is something else:

To finalize the sample table for consumption in ZARP, where it (apparently, and unfortunately) cannot contain missing values (np.NaN), which instead have to be set to something like XXXXXXXXXXXXXXX (see example table in ZARP).

I'll create another issue for that. For this PR though, it means that the wiring in zarp.zarp should be changed such that the defaults block should be moved up, right after the initialization of the sample record processor.