Adding CI for the Docker image will help identify what works and what doesn't. This patch adds a CI workflow, which consists of
a job that builds the image and
jobs that execute tutorial notebooks using Papermill within Docker containers based on the built image.
The latter uses a separate reusable workflow and a shell script invoked within Docker. Output notebooks and log files (*.log) are saved as artifacts. Intermediate results are also saved, to be used in the subsequent dependent jobs.
Some remarks:
Appendix notebooks (a1-a6) do not appear to be well maintained. Moreover, I am not sure about their dependency tree, specifically which notebook requires the output of others (as represented by the needs and outputs parameters of each job). Consequently, they fail to run.
Currently, 4a fails but it will be fixed in another PR.
The 4a notebook on GitHub does not include the limit by SCANDAL (printed no SCANDAL). So, I do not include 3c as a dependency for 4a, but I guess it is OK to include it to generate the limit.
3a randomly fails due to freezing in training (you may need to re-run failed jobs). A possible cause is num_workers passed in DataLoader, which is by default set to 8, and causes the following warning:
UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
I set a timeout of 1 hour for each cell of notebooks (the default timeout parameter of the reusable workflow) to catch the above freezing and to save failed notebooks. Maybe this timeout should be extended if there is a more time-consuming cell.
Adding CI for the Docker image will help identify what works and what doesn't. This patch adds a CI workflow, which consists of
The latter uses a separate reusable workflow and a shell script invoked within Docker. Output notebooks and log files (
*.log
) are saved as artifacts. Intermediate results are also saved, to be used in the subsequent dependent jobs.Some remarks:
a1
-a6
) do not appear to be well maintained. Moreover, I am not sure about their dependency tree, specifically which notebook requires the output of others (as represented by theneeds
andoutputs
parameters of each job). Consequently, they fail to run.4a
fails but it will be fixed in another PR.4a
notebook on GitHub does not include the limit bySCANDAL
(printedno SCANDAL
). So, I do not include3c
as a dependency for4a
, but I guess it is OK to include it to generate the limit.3a
randomly fails due to freezing in training (you may need to re-run failed jobs). A possible cause isnum_workers
passed inDataLoader
, which is by default set to 8, and causes the following warning:timeout
parameter of the reusable workflow) to catch the above freezing and to save failed notebooks. Maybe thistimeout
should be extended if there is a more time-consuming cell.See also a workflow run on my branch.