onnx / onnx

Open standard for machine learning interoperability
https://onnx.ai/
Apache License 2.0
17.73k stars 3.66k forks source link

Importing `onnx==1.16.1` causes a segmentation fault on MacOS 11 (Big Sur) #6191

Open joshuacwnewton opened 3 months ago

joshuacwnewton commented 3 months ago

Bug Report

Is the issue related to model conversion?

No.

Describe the bug

I receive a segmentation fault when importing onnx.

image

System information

Reproduction instructions

Spin up a macOS 11 VM. Install onnx==1.16.1. Import onnx.

import onnx

Expected behavior

No segfault.

Notes

gramalingam commented 3 months ago

@cjvolzka : not sure if you have any idea. Seems strange, given that it works for 1.16.0 but not 1.16.1 ...

cjvolzka commented 2 months ago

@gramalingam Between 1.16.0 and 1.16.1 there were some CI / build changes we had to pull in just to get 1.16.1 to build. They were cherry-picked into 1.16.1 in https://github.com/onnx/onnx/pull/6108. I'd guess the issue might be related to one of these, most likely one of:

@joshuacwnewton, what type of VM is spun up, is it an intel or Apple Silicon? It's possible that makes a difference.

joshuacwnewton commented 2 months ago

I believe everything is done with Intel? (I am using Ubuntu + https://github.com/kholia/OSX-KVM and following the instructions as-is, without modifications.)

Also, our user who first discovered this was on a proper macOS machine. Checking our install logs, it looks like their OS is reported as macOS-10.16-x86_64-i386-64bit (with 10.16==11.0).

gramalingam commented 2 months ago

@gramalingam Between 1.16.1 and 1.16.2 there were some CI / build changes we had to pull in just to get 1.16.2. They were cherry-picked into 1.16.2 in #6108. I'd guess the issue might be related to one of these, most likely one of:

@joshuacwnewton, what type of VM is spun up, is it an intel or Apple Silicon? It's possible that makes a difference.

Thanks! If it works with 1.16.2, we can assume these fix it. I guess the trick is to repro this

cjvolzka commented 2 months ago

Ops. Just to clarify, Between 1.16.1 and 1.16.2 there were some CI / build changes we had to pull in just to get 1.16.2 should have been Between 1.16.0 and 1.16.1 there were some CI / build changes we had to pull in just to get 1.16.1 to build. I edited my original comment to correct it.

So whatever was broken in 1.16.1 is likely still broken and would still affect 1.17 (and/or a theoretical 1.16.2).

liqunfu commented 2 months ago

@gramalingam Between 1.16.0 and 1.16.1 there were some CI / build changes we had to pull in just to get 1.16.1 to build. They were cherry-picked into 1.16.1 in #6108. I'd guess the issue might be related to one of these, most likely one of:

@joshuacwnewton, what type of VM is spun up, is it an intel or Apple Silicon? It's possible that makes a difference.

The issue is likely due to github action update of the macos-latest to macOS 14. As it is not within our control to the build system, may I suggest to fix onnx dependency to 1.16.0 for macOS Big Sur 11.7.10 or similar old oss? Please let us know if there is still an issue either with onnx 1.16.0 or newer macOSs with onnx 1.16.1. Thanks.

joshuacwnewton commented 2 months ago

As it is not within our control to the build system, may I suggest to fix onnx dependency to 1.16.0 for macOS Big Sur 11.7.10 or similar old oss?

We could do this with the platform_release environment marker in requirements.txt, I think?

Something like:

# not 100% sure about syntax, testing is needed
onnx<=1.16.0; sys_platform=="osx" and platform_release<"12.0.0"
onnx; sys_platform=="osx" and platform_release>="12.0.0"
onnx; sys_platform!="osx"

That said, PyPI wheels support environment tags, too. And, for the onnx package, I see the following wheel name:

onnx-1.16.1-cp39-cp39-macosx_11_0_universal2.whl

If the official stance is that onnx==1.16.1 is not compatible with macOS 11, then perhaps the wheel should be changed to not specify macOS 11? See, for example: