Open mike-grayhat opened 2 months ago
I think DVC needs all vars in such cases resolved before it can run the pipeline. Your vars essentially define the pipeline. It reads and compiles it first. So, even if allow missing files, it's a bigger change I think to make it dynamic. @skshetry could confirm that.
Does the content of the items.yaml
change on every run?
The content of items.yaml gets generated based on external sources so it changes from time to time. The problem we face right now is that in theory we can put items.yaml under dvc, but we can't even pull it on fresh repo because dvc.yaml is not valid yet. Similarly dvc diff doesn't work. Static nature of dvc dag is a limiting factor for us, but we worked around the most problems except this one, in which case we have to rely on a separate pipeline to pull such files. I'm thinking of a better solution and haven't come up with one yet.
I have quite unusual case where I rely on variable generation from the first stage of the pipeline. The problem is that on the first run it doesn't exist yet which in turn invalidates the whole yaml file.
I don't see an easy way out of it (even hydra works only on experiment runs, not on general dvc repros) and an option to skip missing variables would help a lot.