wooorm / markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions
https://docs.rs/markdown/1.0.0-alpha.21/markdown/
MIT License
965 stars 53 forks source link

crate::Options is not `Sync` #104

Closed stephenlf closed 9 months ago

stephenlf commented 9 months ago

The Options struct is not Sync. This is an issue in web servers because it means we must reallocate an Options struct every time we parse something.

I will work on this problem later.

Full compiler diagnostic:

error[E0277]: `(dyn for<'a, 'b> Fn(&'a str, &'b MdxExpressionKind) -> MdxSignal + 'static)` cannot be shared between threads safely
    --> src/templates/markdown.rs:3:20
     |
3    | static MD_OPTIONS: markdown::Options = markdown::Options::default();
     |                    ^^^^^^^^^^^^^^^^^ `(dyn for<'a, 'b> Fn(&'a str, &'b MdxExpressionKind) -> MdxSignal + 'static)` cannot be shared between threads safely
     |
     = help: the trait `Sync` is not implemented for `(dyn for<'a, 'b> Fn(&'a str, &'b MdxExpressionKind) -> MdxSignal + 'static)`
     = note: required for `Unique<(dyn for<'a, 'b> Fn(&'a str, &'b MdxExpressionKind) -> MdxSignal + 'static)>` to implement `Sync`
note: required because it appears within the type `Box<dyn Fn(&str, &MdxExpressionKind) -> MdxSignal>`
    --> ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:195:12
     |
195  | pub struct Box<
     |            ^^^
note: required because it appears within the type `Option<Box<dyn Fn(&str, &MdxExpressionKind) -> MdxSignal>>`
    --> ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/option.rs:569:10
     |
569  | pub enum Option<T> {
     |          ^^^^^^
note: required because it appears within the type `ParseOptions`
    --> ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/markdown-1.0.0-alpha.16/src/configuration.rs:965:12
     |
965  | pub struct ParseOptions {
     |            ^^^^^^^^^^^^
note: required because it appears within the type `Options`
    --> ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/markdown-1.0.0-alpha.16/src/configuration.rs:1243:12
     |
1243 | pub struct Options {
     |            ^^^^^^^
     = note: shared static variables must have a type that implements `Sync`
stephenlf commented 9 months ago

Current workaround I'm using. I'll write this up in a pull request later.

/// A markdown::Options wrapper that can be safely passed between threads. 
/// Note that this struct cannot wrap Options with MDX parser functions
/// specified. In other words, `options.parse.mdx_ems_parse` and 
/// `options.parse.mdx_expression_parse` must both be `None`.
/// 
/// # Examples
/// 
/// ```rust
/// use docgen::state::ThreadsafeOptions;
/// 
/// // Equivalent to `let options = ThreadsafeOptions::new(markdown::Options::default());`
/// let options = ThreadsafeOptions::default();
/// let app_state = Arc::new(options);
/// // `app_state` may now be used in a multi-threaded web server.
/// ```
#[derive(Debug, Default)]
pub struct ThreadsafeOptions(Options);

unsafe impl Send for ThreadsafeOptions {}
unsafe impl Sync for ThreadsafeOptions {}

impl ThreadsafeOptions {
    /// Create a new `ThreadsafeOptions` wrapper around the provided `Options`
    /// object. 
    /// 
    /// **PANICS**: This function will panic if you attempt to instantiate a
    /// new object with an MDX parser specified. In other words, 
    /// `options.parse.mdx_ems_parse` and `options.parse.mdx_expression_parse`
    /// must both be `None`.
    pub fn new(options: Options) -> Self {
        if options.parse.mdx_esm_parse.is_some() || options.parse.mdx_expression_parse.is_some() {
            panic!("Cannot instantiate markdown::Options with MDX parsers specified")
        }
        Self(options)
    }
}

impl From<Options> for ThreadsafeOptions {
    fn from(value: Options) -> Self {
        Self::new(value)
    }
}

impl Borrow<Options> for ThreadsafeOptions {
    fn borrow(&self) -> &Options {
        &self.0
    }
}
ChristianMurphy commented 9 months ago

Welcome @stephenlf! 👋

The Options struct is not Sync.

This is an XY Problem.

This is an issue in web servers because it means we must reallocate an Options struct every time we parse something.

Memory is allocated all over the place to do parsing and create a syntax tree. Allocation feels like a non issue here.

This library aims to be 100% pure safe rust, sync is unsafe https://doc.rust-lang.org/nomicon/send-and-sync.html Marking a core part of the safe API unsafe is a non-starter here.

Reading between the lines a bit you seem to be trying to reuse options across multiple threads in Rust? Is that accurate? If so why is allocating Options an issue for you?

stephenlf commented 9 months ago

I am incorporating a Markdown parser into an Axum web server I am writing. Rather than reallocate an markdown::Options object every time the parser endpoint is called, I had hoped to save on memory by instantiating a single, immutable markdown::Options object, wrapping it in Arc, and passing it in to the router's axum::State. To do this, the markdown::Options object would have to be Send/Sync, since application state may be passed between threads.

I understand not wanting any unsafe blocks in the codebase. "100% safe Rust" is a remarkable claim. My workaround is sufficient for my use case. And it sounds like the amount of allocation going into an Options struct is trivial compared to the work being done by the parser, so there's probably no need for me to worry about it. However, if you do want to create a thread-safe, "100% safe Rust" Options implementation, I believe there are ways to do it without unsafe code blocks.

I realize now it was presumptuous of me to say that I'd "set up a pull request later." I forgot this wasn't one of my projects! Let me know if you'd like me to explore this issue more.

wooorm commented 9 months ago

Other than safe, this project is also no_std + alloc. It’s incredibly basic, Box and Vec but that’s it. Send/Sync/Arc etc is all much higher level. And indeed, the bottle neck here is parsing documents. I’d assume all energy is better spent there. I’d also guess that Rust could figure this out itself if there were no mdx functions? Not sure if a thread safe version should just omit those?

stephenlf commented 9 months ago

I’d also guess that Rust could figure this out itself if there were no mdx functions? Not sure if a thread safe version should just omit those?

That's the approach I took in my codebase. However, I suspect that annotating the MDX functions something like Sync + Send would also solve the problem without limiting the MDX functionality too much. I haven't explored it very much, but something like this might work:

// markdown-rs/src/util/mdx.rs
// ...
pub type EsmParse = dyn Fn(&str) -> Signal + Sync + Send;
// ...
pub type ExpressionParse = dyn Fn(&str, &ExpressionKind) -> Signal + Sync + Send;
// ...

This would of course mean that users couldn't dynamically create their MDX parsers, though I can't see this being a practical issue for most use cases. This code compiles and passes all checks, though I'm not sure what other downstream effects something like this might have.

One can imagine a ThreadsafeOptions struct that is the same as the Options struct with these additional constraints.

wooorm commented 9 months ago

without limiting the MDX functionality too much

I have some doubts (but unfounded). If it works with https://github.com/wooorm/mdxjs-rs I’d be open to it.


Anyway, closing this for now. The safe part of this project is very important. It also seems like there is a tiny fraction of time spent on this, we can probably spend our time better on parsing memory/performance. If all of that is addressed and it works with mdxjs-rs, I’d be open to a PR and rediscussing it!

Best! :)