Closed JBorrow closed 1 year ago
All modified lines are covered by tests :white_check_mark:
Comparison is base (
d3b92ab
) 44.80% compared to head (bf3faa7
) 44.80%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Info: to install ducc0 from source on MacOS 13 we need to add LDFLAGS="-ld64" to select the old linker (the new MacOS linker crashes...)
Attempts to fix:
https://github.com/simonsobs/pixell/issues/217 https://github.com/simonsobs/pixell/issues/214 https://github.com/simonsobs/pixell/issues/150
Though #150 will need to wait until GitHub Actions provides arm64 Macs, probably.
FWIW my main problem currently is not the MacOS build per say but rather the build of the ducc0 library that we depend on. It does not want to build from source nicely with anything other than the apple clang (I presume that's because apple clang is what numpy is built with; if you build with a different compiler you get all kinds of weird runtime typing errors in shared libraries). One major problem with ducc0 is that there doesn't seem to be an easy way to provide pull requests or issues.
Ok, it seems that a lot of these errors are due to an undefined value of DUCC0_NUM_THREADS, which seems to be required on MacOS due to a bug in ducc0 (otherwise it will set this value to 0, and hence not create a threadpool at all...).
This version installs and scripts/test-pixell
works for me on MacOS.
I am relatively happy with this now. The major thing that's holding me back is that I am currently having to specify different build steps for the Mac and Linux builds, as I don't think it's possible to nicely specify conditional environments:
CC: gcc-12
CXX: gcc-12
FC: gfortran-12
DUCC0_NUM_THREADS: 2
that are needed for Mac builds. That causes some code duplication. We could also consider splitting out the deployment and testing into separate yaml files.
Pinging @mreineck for the ducc0 specific comments
I am relatively happy with this now. The major thing that's holding me back is that I am currently having to specify different build steps for the Mac and Linux builds, as I don't think it's possible to nicely specify conditional environments:
CC: gcc-12 CXX: gcc-12 FC: gfortran-12 DUCC0_NUM_THREADS: 2
that are needed for Mac builds. That causes some code duplication. We could also consider splitting out the deployment and testing into separate yaml files.
This is looking great! I'd table the sort of streamlining you're referring to for a future PR. If this results in Mac wheels, we should push through quickly.
Also for a future PR, if we can trigger deployment of wheels for PRs to PyPI test repositories, that would be cool. I'll raise an issue.
Attempts to fix:
217 #214 #150
Though #150 will need to wait until GitHub Actions provides arm64 Macs, probably.
Also, re: #150 and Apple M1/Silicon builds: I might be wrong about this, but my understanding was that the Github Action environment is unrelated to the wheel builds. The latter is enforced through the VM that cibuildwheel
uses, right? I think you should be able to spawn an M1/Silicon cibuildwheel
even if you don't have the right Github Action environment for it.
See here: https://cibuildwheel.readthedocs.io/en/stable/faq/#apple-silicon
Info: to install ducc0 from source on MacOS 13 we need to add LDFLAGS="-ld64" to select the old linker (the new MacOS linker crashes...)
That's unfortunate. Can you please post the output of the failing linker call? Perhaps I can fix this on the ducc side.
Ok, it seems that a lot of these errors are due to an undefined value of DUCC0_NUM_THREADS, which seems to be required on MacOS due to a bug in ducc0 (otherwise it will set this value to 0, and hence not create a threadpool at all...).
Interesting. Yes, if the variable is undefined, ducc0 will assume a value of 0, but it interprets this value as "use as many threads as there are hardware threads on the machine", which should be analogous to OpenMP's behaviour. If it doesn't use multithreading at all in this case, I suspect that the implementation of std::thread::hardware_concurrency()
(part of standard C++) on MacOS is broken or at least peculiar.
[Edit] The C++ standard specifies that std::thread::hardware_concurrency()
may return 0 in cases where the number of available threads is not clearly defined (one could imagine a machine with a mix of performance and power-efficient cores, for example); if that happens, ducc0
will assume that only one thread should be used, which is not ideal. Do you have a suggestion how I could warn users in such a situation, so that they are at least aware of the potential problem?
One major problem with ducc0 is that there doesn't seem to be an easy way to provide pull requests or issues.
You can open pull requests and issues on https://github.com/mreineck/ducc, which is basically a mirror or the official repo.
[Edit] The C++ standard specifies that
std::thread::hardware_concurrency()
may return 0 in cases where the number of available threads is not clearly defined (one could imagine a machine with a mix of performance and power-efficient cores, for example); if that happens,ducc0
will assume that only one thread should be used, which is not ideal. Do you have a suggestion how I could warn users in such a situation, so that they are at least aware of the potential problem?
At the moment, the threadpool is somehow not created on my machine (M2 Max), which as you've pointed out does include both performance and power saving cores. That is currently enough of an error, but not a very friendly one! Maybe even printing out 'try setting DUCC0_NUM_THREADS' in the Assertion error: no thread pool warning may be helpful?
You could print a warning to stderr if std::thread::hardware_concurrency()
is zero and suggest setting the env var too.
The fact that the thread pool is not generated at all indicates that something is going wrong on my side. I'll investigate this!
The fact that the thread pool is not generated at all indicates that something is going wrong on my side. I'll investigate this!
Great, thanks! I think we should still keep this behavior in pixell
though in case people have older versions of ducc0 installed.
Waiting on PYPI_TEST_TOKEN
, after we have that it should upload automatically to PYPI testing on successful test runs and builds in pull requests.
@mreineck , a tangential request: I understand ducc0 threading doesn't involve OpenMP now, thanks. However, OpenMP is used by many other associated packages (e.g. numpy and CAMB). How do you feel about using OMP_NUM_THREADS to decide threading for ducc0? This way, users don't need to think about specific threading for ducc0. I can't think of a case where the user might want different packages to use a different number of threads, and OMP_NUM_THREADS is often set in an HPC environment.
re: Arm64 wheels. I don't think this will be possible because that cross-compilation requires the use of the Xcode toolchain (which we don't use as we require OpenMP...) - see here: https://cibuildwheel.readthedocs.io/en/1.x/faq/
So I think we will have to unfortunately wait for native arm64 MacOS runners. On the bright side, pixell is not particularly challenging to install from source, especially with the new environment variables.
Seems there is some extra information here: https://github.com/pypa/cibuildwheel/discussions/1445
Looks like the major blocker here is revving the version such that a version can be pushed to the PyPI testing repo.
Our deploy crashes with:
ERROR HTTPError: 400 Bad Request from https://test.pypi.org/legacy/
'0+untagged.1.g6e34700' is an invalid value for Version. Error: Can't
use PEP 440 local versions. See
https://packaging.python.org/specifications/core-metadata for more
information.
I guess this is an unacceptable version, we would need to do something like 0.19.3.post{MRNumber}.dev{X}. That is a problem itself though as we'd need to store X somewhere...
Any ideas anyone? I am not sure what the best way forward is here.
@mhasself is the versioning guru. Any thoughts?
Maybe you could just manually git tag
the branch with an acceptable version identifier?
@mreineck , a tangential request: I understand ducc0 threading doesn't involve OpenMP now, thanks. However, OpenMP is used by many other associated packages (e.g. numpy and CAMB). How do you feel about using OMP_NUM_THREADS to decide threading for ducc0? This way, users don't need to think about specific threading for ducc0. I can't think of a case where the user might want different packages to use a different number of threads, and OMP_NUM_THREADS is often set in an HPC environment.
I can certainly look at OMP_NUM_THREADS
as a fallback if DUCC0_NUM_THREADS
is not found. Good idea, thanks!
Great, thanks!
I have added a branch fixes_for_pixell
on both ducc0
repositories, which
OMP_NUM_THREADS
, andI'll add more fixes on this branch should they become necessary.
@mreineck now I get:
Traceback (most recent call last):
File "/Users/borrow-adm/Documents/SimonsObs/pixell/pixell/tests/test_pixell.py", line 792, in test_thumbnails
ret = reproject.thumbnails(omap, srcs[:,:2], r=r, res=res, proj=proj,
File "/Users/borrow-adm/Documents/SimonsObs/pixell/pixell/reproject.py", line 89, in thumbnails
ithumb = ithumb.resample(fshape, method="fft")
File "/Users/borrow-adm/Documents/SimonsObs/pixell/pixell/enmap.py", line 90, in resample
def resample(self, oshape, off=(0,0), method="fft", mode="wrap", corner=False, order=3): return resample(self, oshape, off=off, method=method, mode=mode, corner=corner, order=order)
File "/Users/borrow-adm/Documents/SimonsObs/pixell/pixell/enmap.py", line 2708, in resample
omap = ifft(resample_fft(fft(map, normalize=False), oshape, off=off, corner=corner, norm=1/map.npix), normalize=False).real
File "/Users/borrow-adm/Documents/SimonsObs/pixell/pixell/enmap.py", line 1137, in fft
else: res = samewcs(enfft.fft(emap,omap,axes=[-2,-1],nthread=nthread), emap)
File "/Users/borrow-adm/Documents/SimonsObs/pixell/pixell/fft.py", line 154, in fft
plan()
File "/Users/borrow-adm/Documents/SimonsObs/pixell/pixell/fft.py", line 51, in __call__
ducc0.fft.c2c(self.a, axes=self.axes, out=self.b, nthreads=self.threads)
RuntimeError:
./src/ducc0/infra/threading.cc: 113 (long ducc0::detail_threading::mystrtol(const char *)):
Assertion failure
error during strtol conversion Invalid argument
OMP_NUM_THREADS
setting instead of DUCC0_NUM_THREADS
works for me now though.
Seems like this is a pixell error now (nthreads=0 in that code block; though it is an integer!), we are assuming that OMP_NUM_THREADS is set and that it should have a default value of zero. We're not using DUCC0_NUM_THREADS... I will open a new issue and look into this separately. But there is still a problem here with ducc0, I guess; it's assuming that nthreads that is passed to ducc0.sht.experimental.synthesis
is a string and trying to convert it to an integer.
But there is still a problem here with ducc0, I guess; it's assuming that nthreads that is passed to ducc0.sht.experimental.synthesis is a string and trying to convert it to an integer.
ducc0
always assumes the nthreads
argument to be passed in as an integer, not a string, so this should be all fine. The stack trace you posted comes from an unsuccessful attempt to convert the content of an environment variable to an integer (mystrtol
is only ever used on the content of environment variables).
I've added a few print statements on the branch; hopefully they provide some insights into what's going on.
@mreineck Here's the output:
trying to read DUCC0_NUM_THREADS
mystrtol: string len is 4
mystrtol: string is |None|
If I do
echo $DUCC0_NUM_THREADS
I get a blank line.
That variable is not defined anywhere in my shell.
The entire test suite passes, so maybe it's worth adding one for this?
I tried the following:
#include <stdlib.h>
#include <stdio.h>
#define TYPE_FAKE_ENV char *
int main() {
TYPE_FAKE_ENV fake_env = getenv("THIS_DOES_NOT_EXIST");
printf("Fake env: %s\n", fake_env);
if (fake_env == NULL) {
printf("Fake env is NULL\n");
} else {
printf("Fake env: %ld\n", strtol(fake_env, NULL, 10));
}
TYPE_FAKE_ENV ducc_env = getenv("DUCC0_NUM_THREADS");
printf("DUCC0_NUM_THREADS: %s\n", fake_env);
if (ducc_env == NULL) {
printf("DUCC0_NUM_THREADS is NULL\n");
} else {
printf("DUCC0_NUM_THREADS: %ld\n", strtol(ducc_env, NULL, 10));
}
return 0;
}
with TYPE_FAKE_ENV=char *
and TYPE_FAKE_ENV=auto
, with a apple clang in C mode, and apple clang in C++17 mode.
I get:
Fake env: (null)
Fake env is NULL
DUCC0_NUM_THREADS: (null)
DUCC0_NUM_THREADS is NULL
for all combinations. I have no idea what's going on.
OK, grepping through the pixell
sources, I found something suspicious:
martin@marvin:~/codes/pixell$ git grep -n setenv
pixell/curvedsky.py:10:utils.setenv("DUCC0_NUM_THREADS", utils.getenv("OMP_NUM_THREADS"), keep=True)
As far as I understand, the "None"
happens when utils.getenv
is called on an environment variable that's undefined. It probably only happens on MacOS, because OMP_NUM_THREADS
is actually set in all your other environments.
Here's the implementation of utils.getenv
and utils.setenv
:
def getenv(name, default=None):
"""Return the value of the named environment variable, or default if it's not set"""
try: return os.environ[name]
except KeyError: return default
def setenv(name, value, keep=False):
"""Set the named environment variable to the given value. If keep==False
(the default), existing values are overwritten. If the value is None, then
it's deleted from the environment. If keep==True, then this function does
nothing if the variable already has a value."""
if name in os.environ and keep: return
elif name in os.environ and value is None: del os.environ[name]
elif value is not None: os.environ[name] = str(value)
So if OMP_NUM_THREADS does not exist, then getenv returns None. When None is passed to setenv, then if DUCC0_NUM_THREADS exists, then it's unmodified (since keep=True). And if it doesn't exist, then it's not created. So I think the behavior is well-behaved and as expected in all cases.
I tested this on linux both with and without OMP_NUM_THREADS defined, and it works.
However, a while back there was a bug that would cause "None"
to be written to the variable if None
is passed. This was fixed on the 5th of October, which I think is two pixell releases ago. So my guess is that someone has an old version of pixell somewhere, and that's screwing things up.
Sigurd (@amaurea ) tells me some of these issues are due to this branch being many commits behind master. I'll update it, if Josh is ok with it.
Oh I see he already replied :)
Thanks @amaurea, updating to the latest master fixed that issue.
Looks like our builds are failing potentially because of a problem with Cython: https://github.com/cython/cython/issues/5771
I have improved the error messages in ducc
related to parsing of environment variable parsing. Next time we should get better hints for debugging!
This change and the OMP_NUM_THREADS
fallback are now in the ducc0
branch and will be part of the 0.33 release.
Update setup.py to respect user-defined environment and allow for compilation on MacOS.