pepkit / eido

Validator for PEP objects
http://eido.databio.org
BSD 2-Clause "Simplified" License
4 stars 6 forks source link

Allow specifying `index_column_name` without `config.yaml` #42

Closed rafalstepien closed 1 year ago

rafalstepien commented 1 year ago

Latest updates in pepkit allowed users to use PEP without config.yaml - just having sample_table.csv; however when sample_table.csv does not have sample_name column (eg. it has sample instead) then the only way to specify this and allow correct validation from command line is to have config.yaml defining new sample table index column name.

What we need is to make sure eido can validate only sample_table.csv without config.yaml even when sample_table has different index column name. The idea is to allow user passing this information to schema.yaml and make eido read that.

Possible options:

  1. Tell eido "always use the first column defined in the schema.yaml as index column"
  2. Change the selection priority order described here from:
  3. Value specified in Project constructor
  4. Value specified in Config
  5. Default value (sample_name)
    to
  6. Value specified in Project constructor
  7. Value specified in Config
  8. Value specified in schema.yaml
  9. Default value (sample_name)
    
    This way schema will not be required to validate if there is a config or if the sample column name is default.

Tasks to do here:

rafalstepien commented 1 year ago

@nsheff can I have your comment here? Which option looks better for you?

Also @nleroy917 @Khoroshevskyi I would love to hear what do you think.

rafalstepien commented 1 year ago

Actually I didn't know that eido provides positional --st-index argument which actually solves the problem (facepalm).

nsheff commented 1 year ago

then the only way to specify this and allow correct validation from command line is to have config.yaml defining new sample table index column name.

Not exactly. Another way is to just pass the name of the index column as a command-line argument. Another way is to assume that the first column is the index column.

nsheff commented 1 year ago

Actually I didn't know that eido provides positional --st-index argument which actually solves the problem (facepalm).

Ok, great.

Another option is to change to:

1. Value specified in Project constructor
2. Value specified in Config
3. Default value (sample_name)
4. If no match, use the first column

The schema specification doesn't make sense, because it would make the peppy Project constructor rely on the schema. You can, of course, get the value from the schema yourself, and then just pass it to the project constructor. But don't make peppy require a schema for creating a Project.