neurolibre / docs.neurolibre.org

Repository containing all the documentation related to neurolibre.
https://docs.neurolibre.com
Other
2 stars 5 forks source link

document wall time #25

Closed pbellec closed 2 years ago

pbellec commented 2 years ago

The total limit is 1h (and 10 retries). See this code

This time includes repo2data + jb build. jb build itself is equal to notebook execution + html build.

There is no wall time on notebook execution, but everything needs to complete within 1 hour. If the submission uses caching, build will be retried up to 10 times. It is because the build may fail due to repo2data, however the results of repo2data get cached so the build may complete after restarting from the cache.

ltetrel commented 2 years ago

Yep, also repo2data does not have caching itself, it entirely depends on the 3d party library that makes the final fetch command. I know for example nilearn does it, as opposed to osf. So if fetcher does not support caching, and data is not downloaded on the first retry (so in less than 1 hour), it will indefinitively try to re-download at each re-try, and jupyter-book build will never be called...

in short and to accomodate all situations:

  1. repo2data needs to take less than an hour
  2. jupyter-book build can take up to 10 hours if caching enabled (or 1 hour without caching).
  3. One notebook execution cannot exceed 1h
ltetrel commented 2 years ago

1 cpu on our test cluster = 1 logical cpu Intel® Xeon® Gold 6248 (cpu thread) https://ark.intel.com/content/www/us/en/ark/products/192446/intel-xeon-gold-6248-processor-27-5m-cache-2-50-ghz.html

1 cpu on our prod cluster = 1 logical cpu Intel® Xeon® Processor E5-2650 (cpu thread) https://ark.intel.com/content/www/us/en/ark/products/64590/intel-xeon-processor-e52650-20m-cache-2-00-ghz-8-00-gts-intel-qpi.html

pbellec commented 2 years ago

so after further investigation, it looks like there is a limit of execution of 1 hour per notebook.

ltetrel commented 2 years ago

So strictly less than 1 hour and 10 hours in total, for 1CPU ~3GHz 2G of RAM (on production there is twice more CPU/RAM). I would even say 50min to take into account the virtualization cost, random networking hiccups and virtual disk access..

ltetrel commented 2 years ago

More information on RAM limits, it seems (and somehow logical) that neurolibre jupyter book build takes more RAM than local jupyter book build, see this https://github.com/neurolibre/neurolibre-reviews/issues/7#issuecomment-1005007813. This is still on-going so it may be a bug as well.

What I would suggest is that we guarantee 3G, but the hard limit is 4G (users that use more than 3G should be aware that they may hit RAM limits because of neurolibre limitations).

ltetrel commented 2 years ago

~Just increased the wall time to 2 hours per notebook with 5 retries. This should give more room to extensive submission like nimare...~ Reverted the 2hour limit. After some thoughts this does not seem to be reasonable to have 2h notebooks on neurolibre...

ltetrel commented 2 years ago

https://github.com/neurolibre/docs.neurolibre.org/commit/65bfb4b081d8689782f9345e2c2adb8c2c80ad23