shortening CI and doc building time

edublancas commented 1 year ago

following up on the comment from @neelasha23 in our team brainstorming:

shortening the CI and doc building time is essential since we're blocked while waiting for this to finish, sharing a few thoughts

ci

I recently added a flag to our repos so pytest prints tests that take >5 seconds to finish, this can help us spot slow tests and fix them. there might be cases where we have no choice, but in the majority of cases we should strive to have fast tests; we could go beyond printing logs and enforce this by failing slow tests. if we don't enforce it as a rule, then we'd have to run periodic checks to look for slow tests

another solution would be to find a vendor that can run our tests in more powerful hardware. I remember seeing some of them. This can be a quick win.

Finally, can also try running pytest in parallel, there's an extension that allows doing that; the challenge there is that if our tests are not fully isolated, we might sun into issues that will be hard to debug.

docs

similarly, as with the CI, we could print the running time for each notebook so we spot slow ones. I think jupyter-book has an option to timeout slow notebooks so that's another option.

we can also check if readthedocs has an option to build our docs in faster hardware.

running notebooks in parallel is also another option, but in this case, we'll have to implement it ourselves. an alternative can be a caching layer. readthedocs doesn't cache anything since it starts from an empty container but jupyter-book has caching built-in. so we could build something that fetches the latest successful docs build from master/main so we can benefit from it. But I'm unsure if readthedocs has an API for downloading previous builds

edublancas commented 1 year ago

@idomic is working on this one. He found an option to cache conda environments: https://dev.to/epassaro/caching-anaconda-environments-in-github-actions-5hde

some thoughts:

Since our projects are Python packages, they must have a setup.py file. All the package dependencies are declared there and we even define optional dependencies (for example, the ones required for testing/development).

the solution that Ido found relies on conda environment files (env.yaml), instead of (setup.py) files but the caching action looks like a generic caching feature so I think it'll work.

I think that changing the run step here (from the proposed solution in the blog post):

      - name: Update environment
        run: mamba env update -n my-env -f environment.yml
        if: steps.cache.outputs.cache-hit != 'true'

for the series of steps that we run to prepare the environment will work

for example, for sk-eval, the run step would be all the pip install commands that we have here: https://github.com/ploomber/sklearn-evaluation/blob/3b3f6aa8f568c0b444d132f369e123026fb1c303/.github/workflows/ci.yml#L42

edublancas commented 1 year ago

If the above doesn't work, we can generate a conda environment file after running the pip install commands. Since most of our repos use conda environments (even when they're calling pip install, conda export freeze > env.yaml will work

edublancas commented 1 year ago

@idomic as discussed earlier - in the top comment here I mention a few things about speeding up the doc building process. let's assign someone to take a deep dive on this

edublancas commented 1 year ago

did some research on accelerating docs building

parallel builds

we can enable this locally for quick testing; however, it's not possible to enable this on readthedocs as it doesn't offer a way to add parameters to the command that builds the docs.

we can enable parallel notebook executing with the -j X option. This speeds up jupysql docs:

parallel:

python -m sphinx -T -E -W --keep-going -b html -d _build/doctrees -D  -j 4 .   54.40s user 10.51s system 85% cpu 1:16.00 total

serial:

python -m sphinx -T -E -W --keep-going -b html -d _build/doctrees -D  .   52.33s user 9.79s system 39% cpu 2:35.62 total

notes

looks like sphinx offers the -j parameter and plugins can decide to take it ignore it. the notebook execution comes from the myst-nb package (documented here), but it's passed to the sphinx CLI, and then grabbed by myst-nb.

One important thing I noted is that this is a "best-effort" thing, if some steps are not parallelizable, the parameter is ignored. I realized this because the flag didn't work I saw some warnings about some latex thing not being parallelizable, then I tried commenting out all the variables in conf.py that reference to latex; ran again and this time it worked (the logs don't say anything but I realized it was working bc the logs printed all at once at the end as opposed to line by line, which happens when running serially)

caching

No solution yet, we need to implement something with S3

Another option is caching the notebooks. jupyter-book already offers a mechanism to cache previously executed notebooks.

The first thing I tried was to enable readthedocs offline formats so each build would produce a zip with all the HTML contents. However, this didn't work and the command triggered a full doc build again since jupyter-book uses a separate folder to store the notebook's state.

the only solution I can think of is to use readthedocs config to run a command before and after building the docs (this is supported). after a successful build, we can upload the full folder to S3 and before each build we can fetch the folder. this will speed things up a lot as the build will only have to run notebooks that have changed.

neelasha23 commented 1 year ago

Which repo should we focus on for this issue? (I saw @edublancas has fixed for skeval).

edublancas commented 1 year ago

let's focus on JupySQL, the tests are pretty fast on ubuntu but slow on windows and mac, although that appears to be a hardware problem.

I think let's focus on implementing the documentation caching (as described in my earlier comment) and see how well it works.

idomic commented 1 year ago

It's also pretty sporadic, sometimes 5m, sometimes 15m, I'm not sure a lot can be done there. Keep updating here. If that fails, we can enable testing in parallel for both repos, it might cut the time by 30%.

ploomber / contributing