Closed pbellec closed 2 years ago
Yep, also repo2data
does not have caching itself, it entirely depends on the 3d party library that makes the final fetch command.
I know for example nilearn
does it, as opposed to osf
.
So if fetcher does not support caching, and data is not downloaded on the first retry (so in less than 1 hour), it will indefinitively try to re-download at each re-try, and jupyter-book build will never be called...
in short and to accomodate all situations:
repo2data
needs to take less than an hourjupyter-book build
can take up to 10 hours if caching enabled (or 1 hour without caching).1 cpu on our test cluster = 1 logical cpu Intel® Xeon® Gold 6248 (cpu thread) https://ark.intel.com/content/www/us/en/ark/products/192446/intel-xeon-gold-6248-processor-27-5m-cache-2-50-ghz.html
1 cpu on our prod cluster = 1 logical cpu Intel® Xeon® Processor E5-2650 (cpu thread) https://ark.intel.com/content/www/us/en/ark/products/64590/intel-xeon-processor-e52650-20m-cache-2-00-ghz-8-00-gts-intel-qpi.html
so after further investigation, it looks like there is a limit of execution of 1 hour per notebook.
So strictly less than 1 hour and 10 hours in total, for 1CPU ~3GHz 2G of RAM (on production there is twice more CPU/RAM). I would even say 50min to take into account the virtualization cost, random networking hiccups and virtual disk access..
More information on RAM limits, it seems (and somehow logical) that neurolibre jupyter book build takes more RAM than local jupyter book build, see this https://github.com/neurolibre/neurolibre-reviews/issues/7#issuecomment-1005007813. This is still on-going so it may be a bug as well.
What I would suggest is that we guarantee 3G, but the hard limit is 4G (users that use more than 3G should be aware that they may hit RAM limits because of neurolibre limitations).
~Just increased the wall time to 2 hours per notebook with 5 retries. This should give more room to extensive submission like nimare...~ Reverted the 2hour limit. After some thoughts this does not seem to be reasonable to have 2h notebooks on neurolibre...
The total limit is 1h (and 10 retries). See this code
This time includes repo2data + jb build. jb build itself is equal to notebook execution + html build.
There is no wall time on notebook execution, but everything needs to complete within 1 hour. If the submission uses caching, build will be retried up to 10 times. It is because the build may fail due to
repo2data
, however the results ofrepo2data
get cached so the build may complete after restarting from the cache.