robo9k / rust-magic-sys

Rust declarations crate for the `libmagic` C library
Apache License 2.0
10 stars 6 forks source link

Add bundled feature #50

Open roblabla opened 9 months ago

roblabla commented 9 months ago

This feature allows compiling and static linking a vendored version of libmagic. This pattern is used by many other crates like rusqlite or yara-rs, and makes it much easier to use them since they don't require having the library precompiled and available. It also makes it much easier to cross-compile, as only a C compiler for the target is necessary.

The bundled version is FILE_5.45

robo9k commented 9 months ago

Hej @roblabla thanks for your contribution!

We've had a similar feature in #4 but that contribution got retracted. While your implementation is pretty straightforward, I did and do have a couple questions/remarks/requirements that might take me a while to work through.

From the top of my head

I'm not quite sure I follow regarding cross-compilcation; this is easier than building libmagic separately using its original autotools?

Fair word of warning; I'm afraid the last pull request didn't get much traction since I have an incomplete idea of how this feature should work and am hesistant to merge a partial solution.

roblabla commented 9 months ago

Thanks for the quick review!

this changes the license of the published crate to (MIT OR Apache-2.0) AND LicenseRef-file / BSD which is a reason why I'd rather have the bundled code in yet another separate magic-src crate (cf. https://github.com/alexcrichton/openssl-src-rs ), even if that is not a common pattern on crates.io

Sure, that's a rather straightforward fix. Yara-rs used to do that too. Should I just make a new crate in this repo, and make a cargo workspace? New, separate repo with the source?

this uses a naive cc build instead of autotools, which might have unexplored drawbacks

This choice mostly comes down to trying to reduce the necessary build-time dependencies, and my past experiences cross-compiling with autotools being painful so I tend to avoid it. Using cc comes with a couple of gotchas (for instance, I define the HAVE_UNISTD_H and others in cc without actually testing for its existence - in practice it's unlikely to matter on most platforms). I need to work through all the HAVE_ defines in file - I know I'm missing some, I was mostly going for the minimal amount here to get something as cross-platform as possible.

One of the main "downsides" with the way I currently build is that I don't enable any of the optional feature. This means no decompression for instance (as that requires depending on libz, libbz2, liblzma and libzstd). I might be able to wrangle it into working though.

I could certainly try using autotools again - I haven't used the crate in a while, it may have improved.

the feature is called "bundled" in config and "vendor" in docs, this seems to be a mistake

Ah yeah, I called it vendor originally, and moved to bundled to match what rusqlite calls it. I'll fix the docs.

there's no docs how this interacts with the other crate features i'd have to look into the details, but there's e.g. a warning about unreachable code in build.rs

Yeah, currently if it's enabled it just disables everything else - hence why the warning about unreachable code. This is mostly placeholder though.

the CI build has not been updated to consider the feature, but fails already

Failure seems to be due to CI not fetching the submodule. It probably needs some adjustment, I'll look into it.

it's unclear to me what this means for Windows platform support e.g. the previous implementation and the vcpkg feature use special versions of the dirent, getopt and tre dependencies which are not bundled/vendored

Yeah, I think for now, documenting this feature as being unix-only would probably be a good idea. The vcpkg build of libmagic is heavily patched. We could reproduce that (either by vendoring a patched version, or taking the patches and applying them in the build script), but it feels like a larger maintenance burden - especially considering it requires a dependency on tre and whatnot.

i'm not entirely sure I want this as a default feature since Rust's SBOM story is quite weak with static C libs, but then again the "vcpkg" feature pretty much is a bundled/vendored static build already. i see how it's more convenient, but I think you'd still have to distribute the default magic database (which is a papercut that cargo isn't really fit to solve anyway)?

So, in my use-case, I don't actually care about the default magic database - I build a custom database at build-time (in a build.rs script) that I then include in the final binary. This is in fact why I'd rather want a static library instead of a dynamic one - I want to make sure the version of libmagic used is the same at compiletime and runtime.

As for whether the feature should be the default or not, I don't mind either way.

I'm not quite sure I follow regarding cross-compilcation; this is easier than building libmagic separately using its original autotools?

I'm working on a rather large project where we cross-compile from linux to linux-musl, macos and windows. To this end, I have a bespoke cross-compilation toolchain with lots of bells and whistles setup to make cc-rs & other build crates "just work", allowing us to build everything from source when we cargo build (including sqlite, boringssl, yara...). This makes it very easy to onboard new people on the project: They just setup the toolchain, clone the repo, and cargo build will build the whole planet and just work:tm:.

Enter libmagic, which does not offer a way to build it as part of cargo build. The only way to make this work (outside of adding the bundled feature as I do here) would be to ship the compiled library as part of the toolchain. This is not ideal for us for two reasons:

  1. It needs to be done separately for each target in the toolchain.
  2. It creates a "hidden" dependency not visible in our SBOM (currently toolchain only provides compiler, crt and libc)

Fair word of warning; I'm afraid the last pull request didn't get much traction since I have an incomplete idea of how this feature should work and am hesistant to merge a partial solution.

That's fair, and I totally understand not wanting to merge a PR without fully understanding its implication - you're the maintainer after all 😄. I personally don't really need this merged upstream - I can use a fork with this PR, so there's no rush. I do think that there would be value in working towards an upstreamable version of this PR, as it'd make libmagic more convenient to use by anyone that already has the cc-rs crate working for their target - instead of requiring some libmagic-specific setup.

robo9k commented 7 months ago

Hej @roblabla

I'm sorry this pull request didn't get much attention recently. I've looked into how licenses and vulnerabilities (rustsec) are usually handled for bundled C code and then got a bit sidetracked elsewhere.

Sure, that's a rather straightforward fix. Yara-rs used to do that too. Should I just make a new crate in this repo, and make a cargo workspace? New, separate repo with the source?

The magic and magic-sys crates used to be in the same repo for some time and I might unify them again at some point. For working on this pull request it'd probably be easier to change the rust-magic-sys repo into a workspace with both the magic-sys and magic-src crates.

I need to work through all the HAVE_ defines in file - I know I'm missing some, I was mostly going for the minimal amount here to get something as cross-platform as possible.

Did you get to this? Also could you create a little doc about what needs to be checked so I or someone else can do this when updating the vendored/bundled file code?

One of the main "downsides" with the way I currently build is that I don't enable any of the optional feature. This means no decompression for instance (as that requires depending on libz, libbz2, liblzma and libzstd). I might be able to wrangle it into working though.

How would you imagine this to work, do you expect some of the decompression libraries to be available or also start bundling them?

Yeah, I think for now, documenting this feature as being unix-only would probably be a good idea. The vcpkg build of libmagic is heavily patched. We could reproduce that (either by vendoring a patched version, or taking the patches and applying them in the build script), but it feels like a larger maintenance burden - especially considering it requires a dependency on tre and whatnot.

I'm not happy about basically requiring vcpkg on Windows (I haven't really tested a GNU/MinGW setup, see #2) but at least it's sort of isolated and maintained elsewhere this way. I'd rather not drag along a bunch of patches for the bundled file code as that makes updating it even more demanding.

So, in my use-case, I don't actually care about the default magic database [..]

Fair enough, but I guess some users of magic-sys won't be this advanced and just try the "bundled"/"vendor" feature to see if it fixes their build issues (and then get runtime issues).

That's fair, and I totally understand not wanting to merge a PR without fully understanding its implication - you're the maintainer after all 😄. I personally don't really need this merged upstream - I can use a fork with this PR, so there's no rush. I do think that there would be value in working towards an upstreamable version of this PR, as it'd make libmagic more convenient to use [..]

As can be seen from the previous attempt of bundling there is some demand. The problem is that that I'm not even dogfooding the crate/s and thus any advanced build magic (heh) basically starts to bitrot right away and needs community support (of which there is little). If you'll be maintaining a fork for bundling I'd also be open to linking to it from my README or such so people are aware of it.

All in all the rabbit hole of C dependencies goes deep and there doesn't seem to be much consensus on how to deal with them in the Rust sphere, so I'm not at all convinced to find a good solution for now.