Open fengelniederhammer opened 1 month ago
The whole config is a mess and needs to be redone
I also just noticed many (or all?) Snakemake steps already log their config which is quite useful. The whole config is not logged, but maybe it's already enough at it is.
TLDR
I accidentally misconfigured the ingest due to inconsistent casing of config values and it took way to long to find out what's wrong.
What happened
I was setting up a Loculus instance with H5N1, i.e. a segmented organism. Some parts of the config repeats itself, so I copied it. Turns out: it didn't work. My config looks something like this:
"Nucleotide sequences" have to be configured in 4 places, but ingest needs it in snake case whereas the others require camel case.
This has several issues / possible improvements:
organismName
), some are snake case (nextclade_dataset_name
)nucleotide_sequences
vsnucleotideSequences
)nucleotide_sequences: ["main"]
). I had to read the code to see that the resulting config is written to a file, and debug the running pod to check the file which wasnucleotideSequences
from the reference genomes (which must be provided anyway). It should always be in sync anyway. I can't think of a use case where the preprocessing pipeline should have a different reference genome than SILO or the ingest.