Closed damooo closed 6 months ago
You are absolutely right, wrapping loaders (which have an async API) into a (sync) Mutex is an aberration. My bad!
However, I'm not entirely convinced by your assertion that loaders are "usually cheap to clone or construct". Some loaders may maintain an in-memory copy of frequently used (and possibly big) contexts. Similarly, HTTP loaders should rely on some form of caching (which could be on disk or in memory). Copying those in-memory caches would be inefficient. The alternative would be to resort to some kind of mutex inside the loaders...
So I would keep the idea of wrapping the loader in a mutex, but use futures-util's async Mutex
instead. Would that make sense?
Could you maybe make a PR with these changes?
However, I'm not entirely convinced by your assertion that loaders are "usually cheap to clone or construct". Some loaders may maintain an in-memory copy of frequently used (and possibly big) contexts. Similarly, HTTP loaders should rely on some form of caching (which could be on disk or in memory). Copying those in-memory caches would be inefficient.
The rust pattern for this is to use interior mutability, wrapped in an Arc
which is cheap to clone. For example, The moka's in-memory cache, is cheap to clone, and all clones are backed by same effective cache. From their docs:
Cloning is a cheap operation for Cache as it only creates thread-safe reference-counted pointers to the internal data structures. Don’t wrap a Cache by a lock such as Mutex or RwLock. All methods provided by the Cache are considered thread-safe, and can be safely called by multiple async tasks at the same time. No lock is needed.
And for on disk cache, like CaCache
too need just path to cache.
Manas
has an implementation of http loader for the exact purpose. Here it was instantiated with moka inmemory caching middleware factory:
So I would keep the idea of wrapping the loader in a mutex, but use futures-util's async Mutex instead. Would that make sense?
That still makes the parser "effective" non concurrent, as each parser usage effectively have to wait for others to complete.
It would be really practical to use a loader factory. If it is considered okay, i should make a pr with that.
The sync mutex is perfectly fine, as lock is acquired in non-async context. It will not block runtime.
Issue was not about parsing task blocking the runtime, but about inability to run multiple parsing tasks concurrently using the same parser, as all tasks have to wait for the lock on config mutex.
May be, that's fine enough, as we can instantiate new parser for each task. Mentioning in docs will be greatly helpful though.
The sync mutex is perfectly fine, as lock is acquired in non-async context. It will not block runtime.
Indeed. I stand corrected. Same for caches being cheap to clone.
Issue was not about parsing task blocking the runtime, but about inability to run multiple parsing tasks concurrently using the same parser, as all tasks have to wait for the lock on config mutex.
I see that now.
May be, that's fine enough, as we can instantiate new parser for each task. Mentioning in docs will be greatly helpful though.
Since this would be specific to the JSON-LD parser, while all other parsers are "concurrency friendly", this is probably not the best option.
I'm reopening this issue, and I'm welcoming a PR replacing "loader" with a "loader_factory" :)
This was actually included in v0.8.0
Current
JsonLdOptions<L>
takes a loader of typeL
, and wraps it in aMutex
. It effectively makes jsonld parser non concurrent, as every concurrent op have to wait anyway till others complete.Instead, it will be great to take a loader-factory
fn() => L
, that can return fresh loader on every call. As most loaders can be cheap to clone or construct, it will enable concurrency without much cost.