Open mattdangerw opened 1 year ago
Why would we ignore the models/
folder?
We are not ignoring the models/
, we are ignoring parts of the models/
depending on the diff. Essentially, we are only selectively running the models we think are affected by a given PR (with an imperfect heuristic).
Also note that this is not for all testing, this is just for "large"
testing (including downloading presets).
Currently for every PR, we download the smallest checkpoint we have for every model and run some testing on them. I have a tough time imagining that scaling to, say, 4x the number of tasks and presets we have today. This could amount to many GBs of data and some complex forward passes.
One option is we no longer do any full preset testing per PR long term. This could make sense, though it would definitely let some bugs go by.
Another option is this "really dumb autodetect", if you update models/bert
, we run pytest model/bert/ --run_large
. But if you update roberta, we do not.
As we grow, and add more and more backbones and tasks, our modeling testing is quickly growing beyond what our infrastructure can handle today. I think this is going to be a general pain point for scaling, and it may be worth investing in some smarter solutions here.
One option would be to only run our
"large"
tests when we update model code in question. We could do this for our accelerator testing with something like this (pseudocode).This could be a relatively lightweight way to avoid the fundamental scaling problem we are facing. We would also need some way to manually invoke a "test everything" command for specific PRs we are worried about (for example, a change to
TransformerDecoder
).