robo9k / rust-magic-sys

Rust declarations crate for the `libmagic` C library
Apache License 2.0
10 stars 6 forks source link

Thread Safety? #28

Closed mrd0ll4r closed 1 year ago

mrd0ll4r commented 2 years ago

Hello :)

I'd like to use magic in my project to detect MIME types, in addition to tree_magic (which uses the freedesktop MIME database). The problem is: I'm using multithreaded async Rust, which requires futures to be Send. Looking at the docs for magic, I can see that a Cookie is !Send + !Sync. Usually, I'd just wrap stuff in Arc<Mutex<T>>, but that requires T: Send.

A working alternative I'm using at the moment is to encapsulate anything magic-related within tokio::task::spawn_blocking, but this is suboptimal because a) I need to re-initialize the Cookie every time and b) I need to move the data I'm analyzing into that block, which is costly.

I'm still a beginner w.r.t. Rust -- do you think it'd be possible to share a Cookie between threads safely?

robo9k commented 2 years ago

Hi!

First of a bit of nitpicking;

That being said your issue concerns both of them, i.e.

The data in magic_t is just an opaque handle from the libmagic C library. While it would be safe to copy the handle as plain data, the question remains whether the libmagic C library implementation can deal with that.

Unfortunately it seems like the answer is; no, the libmagic C library itself is not thread-safe at all. Here's a couple indicators as to why:

So tl;dr is no, I believe it's not safe to share nor send magic_sys::magic_t nor magic::Cookie between threads because the underlying libmagic library does not support this.

Since you are using tokio, have you looked at their bytes crate? It can allow creating a shareable instance which you should be able to use with magic::Cookie::buffer(), but I have not tested this myself.

Since you are using async and libmagic requires a handle-per-thread, maybe using a thread-per-core such as with Glommio could further help performance, but I have not explored this at all.

Note that as a micro optimization (but you'd better benchmark this), you could create a smaller magic database for use with magic::Cookie::load() which might make creating those instances slightly less costly, but only works if you know you don't need the full default database.

I'd like to hear which solution you end up with and also which parts of the magic and magic-sys documentation could be improved to explain the lack of Send and Sync.

mrd0ll4r commented 1 year ago

Hey,

thanks for the in-depth analysis! It's intuitive that using a libmagic handle from multiple threads in parallel is a bad idea, since libmagic does not explicitly allow this. However, I'm still not 100% sold on why it shouldn't be possible to encapsulate the handle within a Mutex and have calls to magic::Cookie::buffer() in the critical section. But in the end, you're probably right: libmagic does not explicitly allow this either, and all presented solutions seem to resort to one-handle-per-thread solutions.

After some benchmarks, opening the database takes only a handful of milliseconds, which is fine for my use case. So in the end, I didn't come up with any fancy solution, and just do it like this for now. If this comes up as performance-critical for me in the future, I might resort to something like a thread pool for libmagic handles, and communicating with them via channels, or something like that.

Thanks again!