pepkit / divvy

Standardized computing resource configuration
http://divvy.databio.org
BSD 2-Clause "Simplified" License
4 stars 2 forks source link

add config schema #54

Closed stolarczyk closed 3 years ago

stolarczyk commented 3 years ago

Add config schema to use validation feature in yacman 0.8.0 (https://github.com/databio/yacman/pull/44)

stolarczyk commented 3 years ago

right now the config schema just requires compute_packages section to be present and asserts the structure of the adapters and compute_packages. Do we want to add more restrictions? For example:

compute_packages:
  default: <-- require?
    submission_template: templates/local_template.sub  <-- require?
    submission_command: sh  <-- require?
nsheff commented 3 years ago

but where is the schema?

Is it not getting added to the manifest?

nsheff commented 3 years ago

Ok that commit solved the missing schema issue.

This is great -- but can we make it optional so that if you lack a schema, the thing will still work? At least for awhile? Because this adds some complexity; now you have to not just have a divvy config, you also have to have a schema. I guess if the schema is internal and provided by divvy then it's not really a problem. But still, I think having it optional, so that it just doesn't validate if it can't find a schema, is good to start.

stolarczyk commented 3 years ago

yeah, apparently I forgot to add it to the manifest...

for the conditional config validation, it seems like it would add complexity, because the "don't validate" branch of the conditional will never get executed, as long as users don't temper with the installed package in the library. Your situation was an exception caused by my mistake.

nsheff commented 3 years ago

Right, I think that makes sense to me! as long as the schema is built in it should be no problem...

nsheff commented 3 years ago

would this make config file format changes more challenging though as it will make things harder to make backwards compatible?

stolarczyk commented 3 years ago

If we include an updated schema in the library it shouldn't be an issue, right?

I view this feature to be completely internal, and we have full control over the validation. There is no way for a user to provide an external schema. If you look at the first couple of lines of the class instantiation it becomes apparent:

    def __init__(self, entries=None, filepath=None):
        if not entries and not filepath:
            # Handle the case of an empty one, when we'll use the default
            filepath = select_divvy_config(None)

        super(ComputingConfiguration, self).__init__(
            entries=entries,
            filepath=filepath,
            schema_source=DEFAULT_CONFIG_SCHEMA,
            write_validate=True,
        )
stolarczyk commented 3 years ago

on a different note, what do you think about this: https://github.com/pepkit/divvy/issues/54#issuecomment-786871458?

Is it safe to say that computing packages are useless without submission_command and/or submission_template?

nsheff commented 3 years ago

on a different note, what do you think about this: #54 (comment)?

Is it safe to say that computing packages are useless without submission_command and/or submission_template?

I can imagine using a submission template without a command... for example, write creates a script, you don't need to use divvy submit necessarily. So, a use case without submission_command seems totally viable.

submission_template might be safe to require. At the moment I can't think of a reason why not. But to be safe maybe on this release we keep the schema a bit lenient?

stolarczyk commented 3 years ago

thanks for clarifying, sounds good