Closed mrd0ll4r closed 1 year ago
Hi!
First of a bit of nitpicking;
That being said your issue concerns both of them, i.e.
magic_sys::magic_t
magic::Cookie
The data in magic_t
is just an opaque handle from the libmagic
C library. While it would be safe to copy the handle as plain data, the question remains whether the libmagic
C library implementation can deal with that.
Unfortunately it seems like the answer is; no, the libmagic
C library itself is not thread-safe at all.
Here's a couple indicators as to why:
libmagic
(source, readable) never even mentions threadslibmagic
issue tracker has no entries for threads
except for one bug in the official Python binding
whose changes, while I'm not proficient in Python, do create new thread-local object instanceslibmagic
mailing list only has one entry
which seems to be about issues where even analyzing different files from multiple threads would run into issuesmod_magic_mime
, which uses modified libmagic
source code, has a note that states that after the changes the module "[..] should be reentrant now but has not been tested in a threaded environment"
which suggests the original code is not thread-safeSo tl;dr is no, I believe it's not safe to share nor send magic_sys::magic_t
nor magic::Cookie
between threads because the underlying libmagic
library does not support this.
Since you are using tokio
, have you looked at their bytes
crate? It can allow creating a shareable instance which you should be able to use with magic::Cookie::buffer()
, but I have not tested this myself.
Since you are using async
and libmagic
requires a handle-per-thread, maybe using a thread-per-core such as with Glommio could further help performance, but I have not explored this at all.
Note that as a micro optimization (but you'd better benchmark this), you could create a smaller magic database for use with magic::Cookie::load()
which might make creating those instances slightly less costly, but only works if you know you don't need the full default database.
I'd like to hear which solution you end up with and also which parts of the magic
and magic-sys
documentation could be improved to explain the lack of Send
and Sync
.
Hey,
thanks for the in-depth analysis!
It's intuitive that using a libmagic handle from multiple threads in parallel is a bad idea, since libmagic does not explicitly allow this. However, I'm still not 100% sold on why it shouldn't be possible to encapsulate the handle within a Mutex
and have calls to magic::Cookie::buffer()
in the critical section. But in the end, you're probably right: libmagic does not explicitly allow this either, and all presented solutions seem to resort to one-handle-per-thread solutions.
After some benchmarks, opening the database takes only a handful of milliseconds, which is fine for my use case.
So in the end, I didn't come up with any fancy solution, and just do it like this for now.
If this comes up as performance-critical for me in the future, I might resort to something like a thread pool for libmagic
handles, and communicating with them via channels, or something like that.
Thanks again!
Hello :)
I'd like to use
magic
in my project to detect MIME types, in addition totree_magic
(which uses the freedesktop MIME database). The problem is: I'm using multithreaded async Rust, which requires futures to beSend
. Looking at the docs formagic
, I can see that aCookie
is!Send + !Sync
. Usually, I'd just wrap stuff inArc<Mutex<T>>
, but that requiresT: Send
.A working alternative I'm using at the moment is to encapsulate anything
magic
-related withintokio::task::spawn_blocking
, but this is suboptimal because a) I need to re-initialize theCookie
every time and b) I need tomove
the data I'm analyzing into that block, which is costly.I'm still a beginner w.r.t. Rust -- do you think it'd be possible to share a
Cookie
between threads safely?