messense / mupdf-rs

Rust binding to mupdf
GNU Affero General Public License v3.0
104 stars 23 forks source link

PDF Filters #25

Closed marioortizmanero closed 3 years ago

marioortizmanero commented 3 years ago

This adds:

And PdfFilterOptions in crate::pdf::filter to configure them.

What do you think about the names? Do you prefer filter or filter_contents?


Closes #24

marioortizmanero commented 3 years ago

@messense can I get some help on how I should be wrapping the callbacks? The image_filter_callback function and PdfFilterOptions::[set_]image_filter in filter.rs obviously don't work and I'm not sure what's the best way to approach that.

messense commented 3 years ago

I'll take a look later.

messense commented 3 years ago

This article could be helpful: https://adventures.michaelfbryan.com/posts/rust-closures-in-ffi/

marioortizmanero commented 3 years ago

I had read the article you linked before, but that only seems to be for functions with a void* parameter. In these cases can use that to pass your state around, but the mupdf functions I'm trying to implement don't. In that case I would have to use a global, but there's got to be a better solution. Although it doesn't have to be a global; the static variable could be contained just inside set_image_filter and could be initialized with once_cell or similars.

For reference, here are some usages of the functions I'm trying to implement in C. But none of these use external variables (I think).

Page filter:

Annotations filter:

The docs don't help much either, and neither does the source code.

marioortizmanero commented 3 years ago

I've implemented an example of how it could work with a "global". Maybe we could use static mut instead of once_cell, as it's not accessed anywhere else, and I couldn't get Lazy::new() or WRAPPER.set(wrapper).unwrap() to work. But knowing that static mut and globals altogether are code smells and wildly unsafe, not sure if it's the way to go.

marioortizmanero commented 3 years ago

Ok so the new implementation is closer to what we wanted and I've made sure it works, now using the void* parameter. It's definitely unsound, though, as I'm getting sporadic segmentation faults. This may be due to multiple things, i think:

Edit: the issue seems to be that the closure address is different in the callback compared to when it's set. Maybe I'm missing a std::pin::Pin?

messense commented 3 years ago

Sorry for lack of response recently, I'll take a look at the CI failure this weekend.

messense commented 3 years ago

Edit: the issue seems to be that the closure address is different in the callback compared to when it's set. Maybe I'm missing a std::pin::Pin?

Have you tried Box the callback?

marioortizmanero commented 3 years ago

Have you tried Box the callback?

Yeah I tried to use a Pin + Box but that didn't work either. I'll try again this weekend with a different approach, maybe I did that incorrectly.

messense commented 3 years ago
running 1 test
test_filters-fe20f4266b4d7a04(89551,0x16d16f000) malloc: Incorrect checksum for freed object 0x149610fc8: probably modified after being freed.
Corrupt value: 0x0
test_filters-fe20f4266b4d7a04(89551,0x16d16f000) malloc: *** set a breakpoint in malloc_error_break to debug
error: test failed, to rerun pass '--test test_filters'

Caused by:
  process didn't exit successfully: `/Users/messense/Projects/mupdf-rs/target/debug/deps/test_filters-fe20f4266b4d7a04` (signal: 6, SIGABRT: process abort signal)

Looks like a double-free bug.

messense commented 3 years ago

Problem resolved, CI is passing now.

marioortizmanero commented 3 years ago

Oh wow great job! I completely missed that. On my machine there's a test that does fail randomly, but I don't think that's related to this PR:

failures:

---- display_list::test::test_multi_threaded_display_list_search stdout ----
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread 'display_list::test::test_multi_threaded_display_list_search' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src/display_list.rs:200:10

failures:
    display_list::test::test_multi_threaded_display_list_search
marioortizmanero commented 3 years ago

The only remaining question is this:

            // TODO: `context()` inside this function should probably use the
            // parameter's value instead of the global value, right?

But it doesn't seem to be a problem because the ctx used is the same as the original one. The callback just passes that to have it easily accessible, but we already have context() for that. Should I remove that then?

messense commented 3 years ago

Oh wow great job! I completely missed that. On my machine there's a test that does fail randomly, but I don't think that's related to this PR:

failures:

---- display_list::test::test_multi_threaded_display_list_search stdout ----
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `1`', src/display_list.rs:194:21
thread 'display_list::test::test_multi_threaded_display_list_search' panicked at 'called `Result::unwrap()` on an `Err` value: Any', src/display_list.rs:200:10

failures:
    display_list::test::test_multi_threaded_display_list_search

Please open a new issue for this, thanks!

messense commented 3 years ago

The only remaining question is this:

            // TODO: `context()` inside this function should probably use the
            // parameter's value instead of the global value, right?

But it doesn't seem to be a problem because the ctx used is the same as the original one. The callback just passes that to have it easily accessible, but we already have context() for that. Should I remove that then?

I guess it's not a big deal, we can deal with it when problem does occur. 😅

marioortizmanero commented 3 years ago

Ok! Thanks for the help and for maintaining this great library :)