simoncozens / fontspector

Skrifa/Read-Fonts-based sketch for a font QA tool
Apache License 2.0
7 stars 1 forks source link

new check: arabic_spacing_symbols #2

Closed felipesanches closed 2 months ago

felipesanches commented 4 months ago

Check that Arabic spacing symbols U+FBB2–FBC1 aren't classified as marks.

Unicode has a few spacing symbols representing Arabic dots and other marks, but they are purposefully not classified as marks.

Many fonts mistakenly classify them as marks, making them unsuitable for their original purpose as stand-alone symbols to used in pedagogical contexts discussing Arabic consonantal marks.

(Ported from https://github.com/googlefonts/fontbakery/issues/4295)

felipesanches commented 4 months ago

@simoncozens, I have been working on porting this check and I am still wrapping my head around the Rust language and the way failure conditions are handled. I'm not sure I did the right thing here, so I would appreciate if you could take a look at my implementation and let me know if I did something incorrectly.

I'm specially confused about the get_gdef_glyph_class_def method. It seems to me that it should be trivial to implement and should be almost identical to get_os2_fsselection, but it is not.

felipesanches commented 4 months ago

@simoncozens, please also configure this repo so that I become able to request code-reviews from you.

simoncozens commented 4 months ago

I'll try to find a nice way to rewrite this (I'm kind of OOO today), but unwrap is almost always a bad idea. If you unwrap an Error or a None, the program will panic. We do need a better error handling strategy in fontspector and I haven't worked one out yet, but the idea should be that any code which can raise an error condition should return that as a Result, not blindly unwrap it.

felipesanches commented 4 months ago

I'll try to find a nice way to rewrite this (I'm kind of OOO today), but unwrap is almost always a bad idea. If you unwrap an Error or a None, the program will panic. We do need a better error handling strategy in fontspector and I haven't worked one out yet, but the idea should be that any code which can raise an error condition should return that as a Result, not blindly unwrap it.

I haven't addressed the error handling yet, but I have just fixed the conflicts by git-rebasing the branch and updating the code to the latest API.

felipesanches commented 4 months ago

note: this now includes the commit from PR #5 as well

felipesanches commented 3 months ago

@simoncozens, I've just pushed a few additional commits I've been working on.

I am now handling the error conditions and emitting ERROR status code.

The code feels too verbose. And maybe I am not yet writing good rust code, but this is the result of my initial attempts to understand how error handling works in rust.

One of the next steps, I think, would be to make check functions return Result as well, and then make the check runner transform any propagated errors into ERROR messages instead of abruptly closing the program with a panic.

What do you think? Please take a look at my updated implementation and let me know how can I further improve it.

felipesanches commented 3 months ago

And there's still a gid.unwrap in there, but I think that one may be fine, as it is preceded by a gid.is_some().

Also, I think that if we make checks return return Result<StatusList, Error>, then it will be easier (and less verbose) to handle the errors, as we'll be able to use the question-mark short-hand.

felipesanches commented 3 months ago

@simoncozens, I've updated the code to return CheckFnResult.

But I'm still unhappy about the code-style in my check implementation. I'd love to have a video call with you to get some guidance on better rust coding style.

simoncozens commented 3 months ago

We can have a call next week. But note that you can turn an Option into a Result with ok_or and then use ? for early return.

felipesanches commented 2 months ago

Ok, I've just updated the PR again to the latest API, now using the check macro.

This is now broken though, because I don't know how to properly handle this:


pub fn glyph_class_def(&self) -> Option<Result<ClassDef<'a>, ReadError>>```
felipesanches commented 2 months ago

well... I also tried using ok_or, but I think I'm now a bit lost trying to understand how this works, writing code that I do not fully understand, like this:

    let cmap = f.get_cmap()
        .map_err(|_| CheckError::Error("Font lacks a cmap table".to_string()))?;
    let gdef = f.get_gdef()
        .map_err(|_| CheckError::Error("Font lacks a gdef table".to_string()))?;
    let class_def = gdef.glyph_class_def().ok_or("No class_def found")
        .map_err(|e| CheckError::Error(format!("Some classDef error: {}", e)))?;

    for codepoint in ARABIC_SPACING_SYMBOLS {
        let gid = cmap.map_codepoint(codepoint);
        if gid.is_some() && class_def.get(gid.ok_or("Failed to read gid")?) == 3
        {
            problems.push(Status::fail("gdef-mark", &format!(
                "U+{:04X} is defined in GDEF as a mark (class 3).", codepoint)));
        }
    }
simoncozens commented 2 months ago

This is so close. So close. When I look at this code in VS Code with the Rust Analyzer plugin, I see this:

Screenshot 2024-08-30 at 14 18 07

(You can run cargo clippy to get the same information.)

_class_def has a computed type of ()? The let-binding has unit type? What's the return type of the Some match arm? d.map_err(...)? has return type ClassDef<_> - but d.map_err(...)?; is a statement and has no return type!

Remove the ; and I get

Screenshot 2024-08-30 at 14 32 47

and all is well with the world.

Except for one thing:

gid.ok_or("Failed to read gid") has type Result<GlyphId, String>, but our error type is CheckError so we have to ok_or(CheckError::Error("Failed to read gid".to_string())).