pepkit / looper

A job submitter for Portable Encapsulated Projects
http://looper.databio.org
BSD 2-Clause "Simplified" License
20 stars 7 forks source link

sample_modifiers and project_modifiers in looper config file #270

Open nsheff opened 4 years ago

nsheff commented 4 years ago

I just got a bit tripped up with the command extra stuff, and realized:

in looper.cli you have to use command-extra,

but in sample_modifiers.append you use command_extra. This all makes sense, of course, but it is a bit confusing...

The other thing is, if you use looper.cli.command-extra, and then you use -x, they don't stack. That makes sense too, but it's also a bit confusing.

Not really sure what to do about this, maybe just clarify in the docs, but there are some confusing things here.

stolarczyk commented 4 years ago

in looper.cli you have to use command-extra, but in sample_modifiers.append you use command_extra.

as a general rule, yes. And this is how we advertise this in the docs. Options set in looper.cli.<subcmd> must match the argparser long option strings. But in fact the dests work, too. That's because we strictly depend on the default dests, generated from the long option strings with - to _ replacement.

The other thing is, if you use looper.cli.command-extra, and then you use -x, they don't stack. That makes sense too, but it's also a bit confusing.

Why is that confusing? looper.cli.<subcmd> sets command line options that can be overwritten by the actual CLI specification. For example, when confing defines looper.cli.run.command-extra: "testcfg" and I call: looper run -x testcli I end up with "testcli" appended to my command. If I want the config one to persist, I use sample_modifiers.append.command_extra: testcfg.

But I thnik you're right maybe we should elaborate more on the general CLI - looper.cli relationship since that may be confusing.

nsheff commented 4 years ago

Why is that confusing?

it's confusing because you just have to be really aware of exactly what the cli thing is doing. I think the default user would expect the command-extra to append to whatever's in the PEP, and command-extra-override to override it. but that only applies to things in the sample modifiers, not in the cli section. It's complicated. like I said, it makes sense when you understand the implementation...but most people will not.

donaldcampbelljr commented 3 months ago

Some notes for how this works currently in 1.8.0

Using command-extra in the CLI and the project config do stack.

Project_config:

pep_version: 2.0.0
sample_table: sample_annotation.csv
sample_modifiers:
  derive:
    attributes: [file]
    sources:
      source1: "data/{sample_name}.txt"
  append:
    command_extra: "--flavor-flag"

CLI command:

run --looper-config .looper.yaml --command-extra "something"

Rendered command: '/home/drc/GITHUB/hello_looper/hello_looper/intermediate/pipeline/count_lines.sh data/frog_1.txt --flavor-flag something'

Using --command-extra-override="-R" overrides either (or both!) of the above methods as expected.

nsheff commented 3 months ago

Maybe: add 'sample_modifiers' and 'project_modifiers' to looper config, and have looper merge these with the PEP?

donaldcampbelljr commented 3 months ago

And remove appending command_extra within the PEP from the documentation to steer users to use the looper config.

donaldcampbelljr commented 2 months ago

Ok, I've got a first pass of this working in the above PR where I lazily replace sample modifiers in the pep with any supplied in the looper config. Will work on merging them instead of just a simple replacement.

One quirk: I had to add sample_modifiers and project_modifiers as attributes/Fields of the pydantic argument objects so they could be easily passed on during project creation (even though the intention to just pass them via the looper config file and NOT the CLI). It is because of this code immediately after reading the looper_config file: https://github.com/pepkit/looper/blob/1c17c88fdeaa1c3e670f2bb8ab7193e18859f7d9/looper/cli_pydantic.py#L153-L156

The attributes must exist before assignment and we use these attributes as **kwargs during Project creation. I could, of course, carve out special dicts for the sample_modifiers/project_modifiers and add that as new arguments during Project creation but this approach seemed a bit simpler. However, it may lead to end user confusion when using the CLI.

donaldcampbelljr commented 2 months ago

Use deep_update from yacman1.py to merge the modifiers. (also maybe move that func to ubiquerg)

donaldcampbelljr commented 2 months ago

We also discussed leaving the sample_modifiers as arguments because it simplifies downstream activities by having the attributes be attached to the subcommand object. However, we should try to hide the "non-CLI" arguments from the end user to prevent confusion, or attempt to update the subcommand with the non-CLI arguments after object creation.

donaldcampbelljr commented 2 months ago

Ok, I have this working now where sample_modifiers will be merged when provided by the user in both the PEP and in the looper config file. I used deep_update which I ported to ubiquerg and released a pre-release version (0.8.1.a1).

Also, if the user supplies other project level cli modifiers, they will also be merged with any provided in the pep project.

Instead of using sample_modifiers and project_modifiers, I changed the attributes to be sample_modifiers and cli_modifiers for sample and cli modifications respectively. For project modifications, these are actually the top-level keys provided in the looper config file and currently they will take priority over any provided in the pep project config.

CLI arguments will still take the highest priority.

An example looper config file to illustrate:

pep_config: project/project_config.yaml # local path to pep config
output_dir: "results"
pipeline_interfaces:
  sample: pipeline/pipeline_interface.yaml
sample_modifiers:
  derive:
    attributes:
     - [file]
    sources:
      source1: "data/{sample_name}.txt"
cli_modifiers:
  runp:
    command_extra: "--flavorproject"

I still need to hide the new arguments from the CLI to prevent end-user confusion.

donaldcampbelljr commented 2 months ago

This work is now merged to dev. The sample_modifiers and cli_modifiers no longer show in the CLI as arguments.

donaldcampbelljr commented 2 months ago

add project_modifiers as well: https://pep.databio.org/spec/specification/#project-attribute-project_modifiers

should we put these under one key, pep modifiers and sample modifiers is one of many sections under that.

rename cli_modifiers to cli

maybe: create PEP and then use import from looper config to import as nested PEP, but the import might be expected to be a path to a yaml file...might need to make a new hook to do this sinc we normally only import as files.

donaldcampbelljr commented 2 months ago

I attempted to modify the peppyProject before creating the Looper Project (which inherits from the PeppyProject) so that the user-supplied pep modifiers in the looper config could be used when creating the initial peppy project: https://github.com/pepkit/looper/commit/9d253e42de48e56496969952e1275a18aa633ac4

Unfortunately, this does not appear to work when initializing the Looper project. Perhaps my syntax is incorrect?

donaldcampbelljr commented 2 months ago

I fixed the syntax. However, I'm realizing that Peppy does do not provide a way to reinitialize the Project using project_modifiers without using a config file. So, my workaround to create a temp PEP, convert to a dict, update the dict with the Looper-modified values, and then initialize Peppy project from dict still does not work as desired because the re-initialization using _from_dict() does not handle all import/ammendments unlike the main initialization via a cfg file.

So, in summary, we can add use sample_modifiers and cli changes via the looper config file just fine. However, I believe using project_modifiers within the looper config file may require us to refactor or make enhancements to parts of Peppy.

donaldcampbelljr commented 2 months ago

Holding off on moving sample_modifiers and project_modifiers to Looper config file until Peppy refactoring takes place. I reverted many of the changes placed in dev but have kept the functionality to change the cli commands via the cli key.