Looks like the CRAN (Debian) settings no longer help data.table control the number of threads used when being CRAN checked. Email exchange with CRAN team below:
On 13.07.2023 13:32, Phuong Quan wrote:
Hello,
I have this new NOTE in the Debian pretest (but not on Windows), which I think might be a false positive:
checking examples ... [7s/2s] NOTE
Examples with CPU time > 2.5 times elapsed time
user system elapsed ratio
aggregate_data 2.54 0.045 0.509 5.079
My understanding is that the NOTE is caused by the use of multiple threads/processes, but I do not employ any parallelism in the package. I can only assume therefore that it is the data.table package (which the aggregate_data() function uses) that is doing the parallelism.
I found a data.table thread from 2019 (https://github.com/Rdatatable/data.table/issues/3300) where the eplusr package got the same CRAN pretest NOTE only on Debian, and where the data.table maintainer Matt Dowle says:
"Around that time, that CRAN machine used a value of 4 for OMP_THREAD_LIMIT. I discovered that and agreed with CRAN maintainers that it should be 2. It is now 2. That one machine (linux-debian) handles 4 lines of the CRAN checks matrix: devel-gcc, devel-clang, patched-linux and release-linux, which is why those 4 were affected.
You should be able to reproduce the note with export OMP_THREAD_LIMIT=4, but not with 2.
There was a problem in data.table not respecting OMP_THREAD_LIMIT but that was fixed in v1.12.2 (7 Apr 2019); news item 3. Then when data.table started to correctly respect OMP_THREAD_LIMIT it took a while to discover that one CRAN machine used a value of 4."
Could it be that the Debian CRAN machine has a value of 4 for OMP_THREAD_LIMIT again? The daiquiri package does not alter the number of threads or thread limit at any point.
We do not set the flag anymore:
Users may be unaware that parallelism is used and that they have to set such an env var to avoid it.
The package shoudl make sure that not more than 2 cores are used unless expicitly requested by the user.
Best,
Uwe Ligges
We don't get the NOTE in v1.0.3 CRAN checks, but this may be because v1.1.0 has a larger example dataset.
Until/unless data.table implement a fix for this, probably should use setDTthreads() for all relevant examples and vignettes.
Looks like the CRAN (Debian) settings no longer help
data.table
control the number of threads used when being CRAN checked. Email exchange with CRAN team below:We don't get the NOTE in v1.0.3 CRAN checks, but this may be because v1.1.0 has a larger example dataset.
Until/unless
data.table
implement a fix for this, probably should usesetDTthreads()
for all relevant examples and vignettes.