keras-team / keras-nlp

Modular Natural Language Processing workflows with Keras
Apache License 2.0
762 stars 228 forks source link

Selectively run "large" tests only when model code changes #943

Open mattdangerw opened 1 year ago

mattdangerw commented 1 year ago

As we grow, and add more and more backbones and tasks, our modeling testing is quickly growing beyond what our infrastructure can handle today. I think this is going to be a general pain point for scaling, and it may be worth investing in some smarter solutions here.

One option would be to only run our "large" tests when we update model code in question. We could do this for our accelerator testing with something like this (pseudocode).

pytest keras_nlp/ --ignore=keras_nlp/models --run_large
for dir in model dirs:
  if $(git diff --quiet HEAD master -- $dir):
    pytest keras_nlp/models/$dir --run_large

This could be a relatively lightweight way to avoid the fundamental scaling problem we are facing. We would also need some way to manually invoke a "test everything" command for specific PRs we are worried about (for example, a change to TransformerDecoder).

jbischof commented 1 year ago

Why would we ignore the models/ folder?

mattdangerw commented 1 year ago

We are not ignoring the models/, we are ignoring parts of the models/ depending on the diff. Essentially, we are only selectively running the models we think are affected by a given PR (with an imperfect heuristic).

Also note that this is not for all testing, this is just for "large" testing (including downloading presets).

Currently for every PR, we download the smallest checkpoint we have for every model and run some testing on them. I have a tough time imagining that scaling to, say, 4x the number of tasks and presets we have today. This could amount to many GBs of data and some complex forward passes.

One option is we no longer do any full preset testing per PR long term. This could make sense, though it would definitely let some bugs go by.

Another option is this "really dumb autodetect", if you update models/bert, we run pytest model/bert/ --run_large. But if you update roberta, we do not.