philss / rustler_precompiled

Use precompiled NIFs from trusted sources in your Elixir code
181 stars 25 forks source link

Force recompilation or raise better error when target OS changes #52

Open philss opened 1 year ago

philss commented 1 year ago

It's possible that someone accidentally try to run a project that was compiled for a different target system.

philss commented 1 year ago

After talk with José, we saw that it's not possible to detect this since the error is directly from Erlang. So I'm closing.

wojtekmach commented 1 year ago

where is native code located at, something like priv/polars.so? If so, if the native path contained target triple maybe we can handle it better ie if the path doesn’t exist we are running on a different system than compiled against?

philss commented 1 year ago

where is native code located at, something like priv/polars.so?

Yes, it is stored in the priv/native path of each lib.

If so, if the native path contained target triple maybe we can handle it better ie if the path doesn’t exist we are running on a different system than compiled against?

Makes sense. I think with a separated process and with the metadata file, we could detect that. Probably this wouldn't avoid the error message from Erlang, but we could have a better error.

mlwilkerson commented 1 year ago

This may be on a tangent to this concern, or maybe it's another example of a similar kind of problem:

I've been having trouble lately with some of my packages built on the GitHub Actions macos-latest runner. There seems to be some sort of linkage problem such that a package could build successfully on macos-latest but then at runtime on my dev machine (or a teammate's) dev machine, it fails, apparently because it's not able to do the dynamic linking it's supposed to do at runtime.

I get something like:

'Failed to load NIF library: \'dlopen(/Users/foo/repos/bar/_build/test/lib/baz/priv/native/libbaz-v0.1.18-nif-2.16-aarch64-apple-darwin.so, 0x0002): symbol not found in flat namespace \'_unmask\'\''

So in this case, it's not that there's a confusion of the target OS, per se: aarch64-apple-darwin is correct. But there's apparently some other library incompatibility across different versions of that same target OS. Our dev machines do have a newer version of mac OS on them than current macos-latest on GitHub, so I suppose there's some related significant difference. This doesn't happen with all of my rustler_precompiled packages being built via macos-latest in GitHub Actions that run on my team's dev machines--just one of them that, apparently, has a dynamic link dependency on some library that is different between the two mac OS versions (I suppose).

I'm not sure what--if anything--could be done at compile time to discover this local system incompatibility issue and force a re-compilation preemptively. I'm not necessarily looking for a solution like that either. I'm currently solving my problem a different way. I'm just offering this experience as another possibly-similar scenario to consider.