servo / unicode-bidi

Implementation of the Unicode Bidirection Algorithm in Rust
Other
74 stars 34 forks source link

UTF-16 support #109

Closed jfkthame closed 8 months ago

jfkthame commented 8 months ago

This aims to provide "native" UTF-16 support, with *U16 versions of InitialInfo, BidiInfo, and Paragraph structs that are parallel to the existing types but work on a [u16] text buffer instead of a str.

Fixes https://github.com/servo/unicode-bidi/issues/108

Manishearth commented 8 months ago

This is pretty big and I'm unlikely to be able to review it soon, fwiw.

jfkthame commented 8 months ago

This is pretty big and I'm unlikely to be able to review it soon, fwiw.

Yeah - sorry, I know there's a lot of it!

To make it more approachable, I tried to post it as a series of logical steps that build on each other (and I think it builds and passes tests at all stages of the patch queue, IIRC).

Much of it is moving (rather than modifying) existing code, and then duplicating (with str -> [u16] changes) the structs that are the public API. So there's actually not much new code here; once the TextSource trait is available, the rest is pretty mechanical.

jfkthame commented 8 months ago

Huh, I wanted to completely remove pub from the TextSource trait, and locally that works fine for me, but the CI build here doesn't seem to like it. Sigh.

jfkthame commented 8 months ago

@Manishearth Thanks for all the review & comments. I think I've addressed all the nits/issues, just a couple of suggestions that I didn't do yet -- they seem fine in principle but my Rust is pretty weak still.

Apologies for the repeated force-push spam! It was all building fine for me locally, but took some repeated attempts to get the CI jobs happy. Not sure about the failing 1.36.0 job? Please let me know what I ought to do, if anything, or can we ignore it? Thanks!

mbrubeck commented 8 months ago

Not sure about the failing 1.36.0 job?

It looks like sealed 0.5.0 and its dependencies will not build on Rust 1.36.0. We would need to increase our minimum supported Rust version to 1.56.0 to use this crate.

I tried downgrading to older versions of sealed but these failed to build on 1.36.0 for a different reason. However, they might work with versions newer than 1.36.0 but older than 1.56.0

Manishearth commented 8 months ago

We ought to be able to tweak our CI lockfile

mbrubeck commented 8 months ago

We ought to be able to tweak our CI lockfile

Unfortunately, sealed 0.5.0 requires syn 2.0.0 or higher, and there is no matching version of syn that compiles with Rust 1.36. (syn 2.0 uses Rust 2021 edition, which requires rustc 1.56.0 or later.)

sealed 0.4 and earlier use syn 1.x, but the sealed code itself will not build with rustc 1.36.0 in any published version. (It is missing extern crate proc_macro; which was required for using the proc_macro crate in Rust 2018 edition crates at the time.) However, they might compile with versions newer than 1.36 but older than 1.56.

Manishearth commented 8 months ago

Alright, in that case I'm fine dropping the CI requirement!

mbrubeck commented 8 months ago

To be clear, that means increasing the MSRV to 1.56 and adding unconditional dependencies on sealed and syn.

As an alternative, what about implementing the sealed trait pattern “manually” (without a proc-macro crate)?

Manishearth commented 8 months ago

I'm comfortable increasing to 1.56, but yes if we have an alternative we should use it

I didn't even realize we depended on syn, I would love for this to be a leaf crate

mbrubeck commented 8 months ago

Currently we depend on syn only with the optional serde or flamer features enabled. By default this crate has zero dependencies.

Manishearth commented 8 months ago

ah, yeah, I'm fine with breaking msrv for those features only

we can tweak CI to not run those

jfkthame commented 8 months ago

I took @mbrubeck's suggestion (thanks!) to implement the sealed-trait pattern manually, which seems straightforward, so that removes the dependency on the sealed crate.

Unfortunately, 1.36.0 still fails, because REPLACEMENT_CHARACTER, from_u32(), and decode_utf16() are all in the std version of char, not provided by the primitive char type.

@Manishearth Should we just provide local versions of these, too? They seem like they should be simple enough...

mbrubeck commented 8 months ago

Unfortunately, 1.36.0 still fails, because REPLACEMENT_CHARACTER, from_u32(), and decode_utf16() are all in the std version of char, not provided by the primitive char type.

You can use these as core::char::REPLACEMENT_CHARACTER, core::char::from_u32, etc.

jfkthame commented 8 months ago

Unfortunately, 1.36.0 still fails, because REPLACEMENT_CHARACTER, from_u32(), and decode_utf16() are all in the std version of char, not provided by the primitive char type.

You can use these as core::char::REPLACEMENT_CHARACTER, core::char::from_u32, etc.

Huh, you're right -- the error message told me I needed std, but core also works. So it's sufficient to just say use core::char; at the beginning of the module to resolve this.

Next up is that I put a lot of the tests and expected-results into arrays, and then use for t in tests { ... } to run the same calls & assertions over a whole set of tests. That works fine for me with current Rust, but again, 1.36.0 doesn't like it, so looks like that'll need to be reworked. I'll try to have a look at it, but any hints/suggestions are welcome!

mbrubeck commented 8 months ago

There are various options, but one simple one is to change the arrays to vecs: let tests = vec![ /* ... */ ];

It looks like Rust 1.36 also chokes on one use of format! to create an &str in the test module. This can probably be replaced by a string literal with Unicode character escapes, or you can move the format! to a binding outside of the tests vector like:

        let fsi_rtl_pdi_ltr = format!("{}א{}a", chars::FSI, chars::PDI);
        let tests = vec![
            // ...
            (
                &fsi_rtl_pdi_ltr,
                // ...
jfkthame commented 8 months ago

Thanks for all the feedback & suggestions -- I don't know my way around the Rust ecosystem much yet, so it takes me a while to find things! At this point I think this should be ready to merge, AFAICS.

Manishearth commented 8 months ago

Apologies for the repeated force-push spam! It was all building fine for me locally, but took some repeated attempts to get the CI jobs happy. Not sure about the failing 1.36.0 job? Please let me know what I ought to do, if anything, or can we ignore it? Thanks!

When dealing with a PR this big it's better to just make fixup commits and clean up the commit history later (or at least explain the changes between force pushes in a comment)

Manishearth commented 8 months ago

Hmm, I went through all of the new interdiffs and could not see any of the issues being addressed. I might have missed an interdiff or a commit, I might have to go through this commit by commit again.

(This is why I was discouraging force pushes)

Manishearth commented 8 months ago

Thanks!