poseidon-framework / poseidon-hs

A toolset to work with modular genotype databases in the Poseidon format
https://poseidon-framework.github.io/#/trident
MIT License
7 stars 2 forks source link

Validate only components of a package with `validate` #248

Closed nevrome closed 1 year ago

nevrome commented 1 year ago

This would be useful in day-to-day work and for automatic pipelines where only individual files matter, e.g. the .ssf file in poseidon-eager. I imagine a simple API like this:

trident validate (
    (-d|--baseDir DIR) |
    --pyml PATH |
    (-p|--genoOne PATH) |
    --inFormat ARG --genoFile PATH --snpFile PATH --indFile PATH |
    --janno PATH |
    --ssf PATH |
    --bib PATH
)

This allows to parse/validate a package collection (as before), or an individual set of genotype data files, or an individual POSEIDON.yml, .janno, .ssf or .bib file. We could also make it a bit more flexible by, e.g. allowing multiple files of one type or even multiple files of different types. I don't know how flexible this has to be. I assume most of the time it will be applied to one file only.

stschiff commented 1 year ago

I like the idea, in principle. But how would this break existing workflows? Right now, when a user calls trident validate they can trust that it tells them whatever is wrong. In the future this would then i) not anymore work, and b) ii) make them require to enter quite a long command line to replicate the old behavior.

Perhaps we should add an option trident validate --full or something, which we could even recommend if users try to run trident validate in a vanilla way?

nevrome commented 1 year ago

Nono - there is a misunderstanding here: trident validate -d PATH should behave as before, so crawl all directories under PATH, find all packages and validate them completely. But for example trident validate --pyml PATH should only validate a single POSEIDON.yml file.

stschiff commented 1 year ago

Sounds good then!