Thank you so much for reviewing the morx code!

I will make the recommended changes and commit it when it’s done.

From: Wesley Moore Sent: Wednesday, January 18, 2023 12:36 PM To: yeslogic/allsorts Cc: cliu2018 ; Author Subject: Re: [yeslogic/allsorts] Morx (PR #85)

@wezm commented on this pull request.

Great to see this coming in. Hopefully it opens up shaping of fonts like Apple Color Emoji, Hoefler Text, Zapfino, etc.

Some general comments:

We have in the past been a bit stingy with derives on structs (see #69) it would be good to add Copy/Clone, and PartialEq/Eq to new public structs.

Speaking of public structs, we should consider making the fields of structs that map directly to table data public too.

I think we need some more test coverage of this code to ensure that it's working as expected and remains working in the future. It might be worth trying to see if there are any freely licensed fonts out there that have a morx table that we might be able to add to the repo (in tests/fonts) and write some tests to exercise the code using that font. Probably worth checking if Harfbuzz has already identified such a font.

The unicode text rendering tests also have 40 odd morx tests and we've already integrated Allosorts into that project. It would be worth running that test suite against this branch to see how it fares.

In src/morx.rs:

@@ -0,0 +1,1825 @@

+use std::convert::TryFrom;

+use crate::binary::read::{ReadBinary, ReadBinaryDep, ReadCtxt, ReadFrom, ReadScope};

+use crate::binary::U16Be;

+use crate::binary::U32Be;

+use crate::binary::U8;

+use crate::error::ParseError;

+use crate::gsub::{FeatureMask, Features, GlyphOrigin, RawGlyph};

+use crate::tinyvec::tiny_vec;

+//----------------------------------------------------------------------------------

+#[derive(Debug)]

+pub struct MorxHeader {

version: u16,
unused: u16,

Since this is unused I don't think we need to store it.

In src/morx.rs:

+#[derive(Debug)]

+pub struct Chain {

chain_header: ChainHeader,
feature_array: Vec,
subtables: Vec,

+impl<'a> ReadBinary<'a> for Chain {

type HostType = Self;
fn read(ctxt: &mut ReadCtxt<'a>) -> Result<Self, ParseError> {
let chain_header = ctxt.read::()?;
let features =
ctxt.read_array::(usize::try_from(chain_header.n_feature_entries)?)?;
let feature_vec = features.to_vec();

No need to allocate a vector here, I think the features can be stored as ReadArray on Chain.

In src/morx.rs:

+

+//----------------------------------------------------------------------------------

+#[derive(Debug)]

+pub struct Subtable {

subtable_header: SubtableHeader,
subtable_body: SubtableType,

+impl<'a> ReadBinary<'a> for Subtable {

type HostType = Self;
fn read(ctxt: &mut ReadCtxt<'a>) -> Result<Self, ParseError> {
let subtable_header = ctxt.read::()?;
let subtable_body: SubtableType;
let subtable_body_length: usize = usize::try_from(subtable_header.length - 12)?;

No need to annotate the type here and it might be worth including a comment to explain that 12 is subtracted to skip the size of the header fields.

In src/morx.rs:

let ligature_subtable = subtable_scope.read::()?;

subtable_body = SubtableType::Ligature { ligature_subtable };
}
4 => {
//Get a shorter scope from the context to read the subtable
let subtable_scope = ctxt.read_scope(subtable_body_length)?;
let noncontextual_subtable = subtable_scope.read::()?;
subtable_body = SubtableType::NonContextual {
noncontextual_subtable,
};
}
_ => {
//read the subtable to a Vec if it is another type other than ligature or noncontextual
//let subtable_array = ctxt.read_array::(subtable_body_length)?;
let subtable_array = ctxt.read_array::(subtable_body_length)?;
let subtable_vec = subtable_array.to_vec();

I don't think a vector needs to be allocated here. A slice could be used instead:

let subtable_scope = ctxt.read_scope(subtable_body_length)?;

let data = subtable_scope.data();If this was done all branches would create the subtable_scope so that could be moved out of the match as well.

In src/morx.rs:

+/***

+//--------------------------------------------------------------------------------------------------

+#[derive(Debug)]

+pub enum SubtableType {

Rearragement {
rearrangement_subtable: RearrangementSubtable,
},
Contextual {
contextual_subtable: ContextualSubstitutionSubtable,
},
Ligature {
ligature_subtable: LigatureSubtable,
},
NonContextual {
noncontextual_subtable: NonContextualSubstitutionSubtable,
},
Insertion {
insertion_subtable: InsertionSubtable,
},

+****/

Suggest deleting commented out code.

In src/morx.rs:

+

+//------------------------------------ Subtables ------------------------------------------------

+//Indic Rearrangement Subtable

+#[derive(Debug)]

+pub struct RearrangementSubtable {

stx_header: STXheader,
//subtable body here,
//the rearrangement subtable is just a state table

+//Contextual Glyph Substitution Subtable

+#[derive(Debug)]

+pub struct ContextualSubtable {

stx_header: STXheader,
substitution_subtables_offset: u32,

I don't think needs to be stored as it's only needed when parsing.

In src/morx.rs:

2 => {

bin_srch_header = Some(ctxt.read::()?);
}
4 => {
bin_srch_header = Some(ctxt.read::()?);
}
6 => {
bin_srch_header = Some(ctxt.read::()?);
}
8 => {
bin_srch_header = None;
}
10 => {
bin_srch_header = None;
}
_ => return Err(ParseError::BadVersion),

I think this would probably be BadValue.

In src/morx.rs:

let bin_srch_header: Option;

match format {
0 => {
bin_srch_header = None;
}
2 => {
bin_srch_header = Some(ctxt.read::()?);
}
4 => {
bin_srch_header = Some(ctxt.read::()?);
}
6 => {
bin_srch_header = Some(ctxt.read::()?);
}
8 => {
bin_srch_header = None;
}
10 => {
bin_srch_header = None;
}
_ => return Err(ParseError::BadVersion),
}

This could be simplified:

⬇️ Suggested change

let bin_srch_header: Option;
match format {
0 => {
bin_srch_header = None;
}
2 => {
bin_srch_header = Some(ctxt.read::()?);
}
4 => {
bin_srch_header = Some(ctxt.read::()?);
}
6 => {
bin_srch_header = Some(ctxt.read::()?);
}
8 => {
bin_srch_header = None;
}
10 => {
bin_srch_header = None;
}
_ => return Err(ParseError::BadVersion),
}
let bin_srch_header = match format {
2 | 4 | 6 => Some(ctxt.read::()?,
0 | 8 | 10 => None,
_ => return Err(ParseError::BadVersion),
};

In src/morx.rs:

loop {

let ligature = match ctxt.read_u16be() {
Ok(val) => val,
Err(_err) => break,
};
ligature_list.push(ligature);
}

This pattern of a loop that reads as many values as it can is repeated in a number of places (I'll not leave comments on the other occurrences). I think that it might be able to be made a bit more robust (by not swallowing all errors) and avoid the need to allocate a vector using code like this.

let len_remaining = ctxt.scope().data().len();

ctxt.check(len_remaining % size::U16 == 0)?;

let ligature_list = ctxt.read_array::(len_remaining / size::U16)?;Since it's used in at least 3 or 4 places it could be extracted to a function too, perhaps even added to ReadCtxt.

In src/morx.rs:

let mut state_array: Vec<Vec> = Vec::new();

'outer: loop {
let mut state_row: Vec = Vec::new();
for _i in 0..n_classes.0 {
let value = match ctxt.read_u16be() {
Ok(val) => val,
Err(_err) => break 'outer,
};
state_row.push(value);
}
state_array.push(state_row);
}

ReadArray might be able to simplify this:

⬇️ Suggested change

let mut state_array: Vec<Vec> = Vec::new();
'outer: loop {
let mut state_row: Vec = Vec::new();
for _i in 0..n_classes.0 {
let value = match ctxt.read_u16be() {
Ok(val) => val,
Err(_err) => break 'outer,
};
state_row.push(value);
}
state_array.push(state_row);
}
let mut state_array: Vec<ReadArray> = Vec::new();
loop {
let mut state_row = match ctxt.read_array(n_classes.0) {
Ok(array) => array,
Err(ParseError::Eof) => break,
Err(err) => return Err(err),
}
state_array.push(state_row);
}

In src/morx.rs:

Format2 {

lookup_segments: Vec, //Segment Single format 2
},
Format4 {
lookup_segments: Vec, //Segment Array format 4
},
Format6 {
lookup_entries: Vec, //Single Table format 6
},
Format8 {
first_glyph: u16,
glyph_count: u16,
lookup_values: Vec, //Trimmed Array format 8
},
Format10 {
unit_size: u16,

I don't think this field needs to be stored since its information is captured by lookup_values.

In src/morx.rs:

}

2 => {
//Get a shorter scope from the context to read the subtable
let subtable_scope = ctxt.read_scope(subtable_body_length)?;
let ligature_subtable = subtable_scope.read::()?;
subtable_body = SubtableType::Ligature { ligature_subtable };
}
4 => {
//Get a shorter scope from the context to read the subtable
let subtable_scope = ctxt.read_scope(subtable_body_length)?;
let noncontextual_subtable = subtable_scope.read::()?;
subtable_body = SubtableType::NonContextual {
noncontextual_subtable,
};
}
_ => {

It looks like the spec only defines subtable types 0–5. Perhaps it would be a good idea to create SubtableType::Other for 0 and 5, and return an error for other values.

In src/morx.rs:

OneByte { lookup_values: Vec },

TwoByte { lookup_values: Vec },
FourByte { lookup_values: Vec },
EightByte { lookup_values: Vec },

I think these could be ReadArray as I think the length is known in each case.

In src/morx.rs:

let mut lookup_segments = Vec::new();

for _i in 0..b_sch_header.n_units {
let lookup_segment = match ctxt.read::() {
Ok(val) => val,
Err(_err) => break,
};
lookup_segments.push(lookup_segment);
}
let lookup_table = LookupTable::Format2 { lookup_segments };

I think this could use ReadArray

In src/morx.rs:

let mut lookup_values: Vec = Vec::new();

for _i in 0..(segment.last_glyph - segment.first_glyph + 1) {
let lookup_value = match read_ctxt.read_u16be() {
Ok(val) => val,
Err(_err) => break,
};
lookup_values.push(lookup_value);
}

I think this can use ReadArray.

In src/morx.rs:

let mut lookup_entries = Vec::new();

for _i in 0..b_sch_header.n_units {
let lookup_entry = match ctxt.read::() {
Ok(val) => val,
Err(_err) => break,
};
lookup_entries.push(lookup_entry);
}

I think this could use ReadArray

In src/morx.rs:

let mut lookup_values = Vec::new();

for _i in 0..glyph_count {
let lookup_value = ctxt.read_u16be()?;
lookup_values.push(lookup_value);
}

I think ReadArray could be used here.

In src/morx.rs:

if (glyph as usize) < lookup_values.len() {

return Some(lookup_values[glyph as usize]);
} else {
return None; //out of bounds
}

This pattern (which it repeated a few times) could be simplified using get:

⬇️ Suggested change

if (glyph as usize) < lookup_values.len() {
return Some(lookup_values[glyph as usize]);
} else {
return None; //out of bounds
}
return lookup_values.get(usize::from(glyph))

In src/morx.rs:

for lookup_segment in lookup_segments {

if (glyph >= lookup_segment.first_glyph)
&& (glyph <= lookup_segment.last_glyph)
{
if ((glyph - lookup_segment.first_glyph) as usize)
< lookup_segment.lookup_values.len()
{
return Some(
lookup_segment.lookup_values
[usize::from(glyph - lookup_segment.first_glyph)],
);
}
}
}
return None; //out of bounds

This code is quite complex. Perhaps something like this could make it simpler. I think a similar refactoring could be applied to some of the other lookups as well.

return lookup_segments.iter().find_map(|lookup_segment| {

lookup_segment.contains(glyph)

    .then(|| lookup_segment.get(usize::from(glyph - lookup_segment.first_glyph)))

})

// where lookup_segment.contains would be:

impl LookupSegmentFmt4 {

fn contains(glyph: u16) -> bool {

    (self.first_glyph..=self.last_glyph).contains(glyph)

}

}

In src/morx.rs:

return Some(lookup_segment.lookup_value);

}
}
return None; //out of bounds
}
_ => return None, //Only Format2 is valid here.
}
}
4 => {
match &lookup_table.lookup_table {
LookupTable::Format4 { lookup_segments } => {
for lookup_segment in lookup_segments {
if (glyph >= lookup_segment.first_glyph)
&& (glyph <= lookup_segment.last_glyph)
{
if ((glyph - lookup_segment.first_glyph) as usize)

In cases where we add or subtract values supplied by the font (such as lookup_segment.first_glyph) we probably should use checked arithmetic to avoid under/overflow. Example from elsewhere in the code:

https://github.com/yeslogic/allsorts/blob/1d05ffa243d857770b45d2b0244bb2d747520bd4/src/woff2.rs#L537-L539

In src/morx.rs:

const SET_MARK: u16 = 0x8000;

const DONT_ADVANCE: u16 = 0x4000;
let mut old_glyph: u16;
let mut new_glyph: u16;
//loop through glyphs:
for i in 0..self.glyphs.len() {
let current_glyph: u16 = self.glyphs[i].glyph_index;
old_glyph = self.glyphs[i].glyph_index;
new_glyph = self.glyphs[i].glyph_index;
let mut class = glyph_class(current_glyph, &contextual_subtable.class_table);
'glyph: loop {
let index_to_entry_table = contextual_subtable.state_array.state_array
[usize::from(self.next_state)][usize::from(class)];

In places where we index into arrays with values provided by the font I think get should be used in order to avoid panics if the value happens to be out of bounds.

In src/morx.rs:

'glyphs: loop {

let glyph: RawGlyph<()>;
if i < self.glyphs.len() {
glyph = self.glyphs[i].clone(); //length of glyphs might have changed due to substitution. So need to check.
}
//assume the length of the glyphs array is shorter or unchanged.
else {
break 'glyphs;
}

This could probably be a while loop:

while let Some(glyph) = self.glyphs.get(i) {

let glyph = glyph.clone();

⋮

}

In src/morx.rs:

for chain in morx_table.morx_chains.iter() {

let subfeatureflags: u32 = subfeatureflags(chain, features)?;
for subtable in chain.subtables.iter() {
if subfeatureflags & subtable.subtable_header.sub_feature_flags != 0 {
match subtable.subtable_header.coverage & 0xFF {
1 => {
let mut contextualsubst: ContextualSubstitution<'> =
ContextualSubstitution::new(glyphs);
if let SubtableType::Contextual {
contextual_subtable,
} = &subtable.subtable_body
{
contextual_subst.next_state = 0;
contextual_subst.process_glyphs(contextual_subtable)?;
}

Should there be else branches on these to return an error if the sub-table type is not the expected type?

In src/morx.rs:

liga_subst.next_state = 0;

liga_subst.component_stack.clear();

I think these lines duplicate what LigatureSubstitution::new does.

In src/morx.rs:

const LOWERCASE_TYPE: u16 = 37;

//Feature selectors:
const LOWERCASE_SMALL_CAPS: u16 = 1;
//---------------------------------------
//Feature type:
const UPPERCASE_TYPE: u16 = 38;
//Feature selectors:
const UPPERCASE_SMALL_CAPS: u16 = 1;
//---------------------------------------
let mut subfeature_flags = chain.chain_header.default_flags;
for entry in chain.feature_array.iter() {
match features {
Features::Custom(_features_list) => {
return Ok(subfeature_flags);

Should this return the supplied custom features instead of the default features?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Thanks for reviewing the morx code and making the suggestions. I will make the changes and re-commit it when it’s ready.

From: Wesley Moore Sent: Thursday, November 09, 2023 4:52 PM To: yeslogic/allsorts Cc: cliu2018 ; Author Subject: Re: [yeslogic/allsorts] Morx (PR #85)

@wezm commented on this pull request.

Overall this is shaping up well. I've left a lot of comments—sorry. Many relate to style, following Rust idioms, or consistency with the rest of the code base. All comments prefixed with "Minor" or "Optional" are changes that I don't think should block merging this work, the others I think should be addressed before merging.

There are a couple general comments that apply in several places:

a.. I think indexing, particularly when processing glyphs needs to reviewed to ensure it will not panic with user controlled data. b.. I think there are various places that probably should be using checked arithmetic. I noted some of them but not all. c.. There are some areas that are long and deeply nested which makes it a little difficult to read. I think extracting functions/adding methods to types could help but it's not essential. d.. It would be good to add at least one small (file size wise), permissively licensed font in the code base (E.g. tests/fonts/morx) and add some basic tests that use it to help ensure the code works on a real font (and stays working).

In src/font.rs:

+use crate::morx; +use crate::morx::MorxTable; Minor:

⬇️ Suggested change -use crate::morx; -use crate::morx::MorxTable; +use crate::morx::{self, MorxTable};

In src/font.rs:

//let res = morx::apply(&morx_cache, &mut glyphs, features);

//check_set_err(res, &mut err); Commented out code should be removed.

In src/font.rs:

@@ -726,6 +750,22 @@ impl Font { }) }

//pub fn morx_table(&mut self) -> Result<Option<Rc>, ParseError> { Commented out code should be removed.

In src/font.rs:

     let opt_gdef_table = opt_gdef_table.as_ref().map(Rc::as_ref);
let (dotted_circleindex, ) = self.lookup_glyph_index(DOTTED_CIRCLE, MatchingPresentation::NotRequired, None);

//apply morx if table is present Minor:

⬇️ Suggested change

//apply morx if table is present
// Apply morx if table is present

In src/morx.rs:

+pub struct MorxHeader {

_version: u16,

n_chains: u32, +}

+impl ReadFrom for MorxHeader {

type ReadType = (U16Be, U16Be, U32Be);

fn from((_version, _unused, n_chains): (u16, u16, u32)) -> Self {

MorxHeader { _version, n_chains }

} +}

+#[derive(Debug)] +pub struct MorxTable<'a> {

_morx_header: MorxHeader, Since this field is unused I think it can be dropped from the struct.

In src/morx.rs:

//get a scope of length "chain_length" to read the chain and

//advance to the correct position in the buffer for reading

//the next chain, regardless whether the "Subtable Glyph Coverage table"

//is present at the end of the chain. This is a general style comment that applies to the whole PR. The style for comments in the rest of the code is // followed by a space. It would be good to have all the comments in this PR follow that style too.

In src/morx.rs:

//advance to the correct position in the buffer for reading

//the next chain, regardless whether the "Subtable Glyph Coverage table"

//is present at the end of the chain.

let chain_scope = ctxt.read_scope(chain_length)?;

let chain = chain_scope.read::<Chain<'a>>()?;

chain_vec.push(chain);

}

Ok(MorxTable {

_morx_header: morx_header,

morx_chains: chain_vec,

})

} +}

+//---------------------------------------------------------------------------------- Minor: We haven't used separators like this in the rest of the code. For consistency I suggest removing them.

In src/morx.rs:

+#[derive(Debug)] +pub struct ChainHeader {

default_flags: u32,

chain_length: u32,

n_feature_entries: u32,

n_subtables: u32, +} Minor: As a general rule all the types are placed at the top of the file with the impls following.

In src/morx.rs:

fn from((_version, _unused, n_chains): (u16, u16, u32)) -> Self {

MorxHeader { _version, n_chains }

} +}

+#[derive(Debug)] +pub struct MorxTable<'a> {

_morx_header: MorxHeader,

morx_chains: Vec<Chain<'a>>, +}

+impl<'b> ReadBinary for MorxTable<'b> {

type HostType<'a> = MorxTable<'a>;

fn read<'a>(ctxt: &mut ReadCtxt<'a>) -> Result<Self::HostType<'a>, ParseError> {

let morx_header = ctxt.read::()?; It looks like this code only supports version 3. Suggest checking this. E.g.

ctxt.check_version(version == 3)

In src/morx.rs:

MorxHeader { _version, n_chains }

} +}

+#[derive(Debug)] +pub struct MorxTable<'a> {

_morx_header: MorxHeader,

morx_chains: Vec<Chain<'a>>, +}

+impl<'b> ReadBinary for MorxTable<'b> {

type HostType<'a> = MorxTable<'a>;

fn read<'a>(ctxt: &mut ReadCtxt<'a>) -> Result<Self::HostType<'a>, ParseError> {

let morx_header = ctxt.read::()?;

let mut chain_vec = Vec::new(); Suggest pre-allocating this Vec:

⬇️ Suggested change

let mut chain_vec = Vec::new();
let mut chain_vec = Vec::with_capacity(morx_header.n_chains);

In src/morx.rs:

_morx_header: MorxHeader,

morx_chains: Vec<Chain<'a>>, +}

+impl<'b> ReadBinary for MorxTable<'b> {

type HostType<'a> = MorxTable<'a>;

fn read<'a>(ctxt: &mut ReadCtxt<'a>) -> Result<Self::HostType<'a>, ParseError> {

let morx_header = ctxt.read::()?;

let mut chain_vec = Vec::new();

for _i in 0..morx_header.n_chains {

//read the chain header to get the chain length

let scope_hdr = ctxt.scope();

let chain_header = scope_hdr.read::()?;

let chain_length: usize = usize::try_from(chain_header.chain_length)?; ⬇️ Suggested change

let chain_length: usize = usize::try_from(chain_header.chain_length)?;

let chain_length = usize::try_from(chain_header.chain_length)?;

In src/morx.rs:

+pub struct Chain<'a> {

chain_header: ChainHeader,

feature_array: ReadArray<'a, Feature>,

subtables: Vec<Subtable<'a>>, +}

+impl<'b> ReadBinary for Chain<'b> {

type HostType<'a> = Chain<'a>;

fn read<'a>(ctxt: &mut ReadCtxt<'a>) -> Result<Self::HostType<'a>, ParseError> {

let chain_header = ctxt.read::()?;

let features =

ctxt.read_array::(usize::try_from(chain_header.n_feature_entries)?)?;

let mut subtable_vec = Vec::new(); Suggest pre-allocating this Vec:

⬇️ Suggested change

let mut subtable_vec = Vec::new();
let mut subtable_vec = Vec::with_capacity(chain_header.n_subtables);

In src/morx.rs:

+pub enum SubtableType<'a> {

Contextual {

contextual_subtable: ContextualSubtable<'a>,

},

Ligature {

ligature_subtable: LigatureSubtable<'a>,

},

NonContextual {

noncontextual_subtable: NonContextualSubtable<'a>,

},

Other {

other_subtable: &'a [u8],

}, +} Optional: Perhaps this would be a little nicer using tuple variants?

⬇️ Suggested change -pub enum SubtableType<'a> {

Contextual {
contextual_subtable: ContextualSubtable<'a>,
},
Ligature {
ligature_subtable: LigatureSubtable<'a>,
},
NonContextual {
noncontextual_subtable: NonContextualSubtable<'a>,
},
Other {
other_subtable: &'a [u8],
}, -} +pub enum SubtableType<'a> {
Contextual(ContextualSubtable<'a>),
Ligature(LigatureSubtable<'a>),
NonContextual(NonContextualSubtable<'a>,
Other(&'a [u8]) +}

In src/morx.rs:

},

Ligature {

ligature_subtable: LigatureSubtable<'a>,

},

NonContextual {

noncontextual_subtable: NonContextualSubtable<'a>,

},

Other {

other_subtable: &'a [u8],

}, +}

+//----------------------------- Extended State Table Header ------------------------------------------

+#[derive(Debug)] +pub struct STXheader { ⬇️ Suggested change -pub struct STXheader { +pub struct STXHeader {

In src/morx.rs:

) -> Self {

STXheader {

n_classes,

class_table_offset,

state_array_offset,

entry_table_offset,

}

} +}

+//------------------------------------ Subtables ------------------------------------------------

+//Contextual Glyph Substitution Subtable +#[derive(Debug)] +pub struct ContextualSubtable<'a> {

_stx_header: STXheader, If this isn't used after parsing suggest dropping it form the struct. (this comment applies to the other structs too)

In src/morx.rs:

let mut offsets_to_subst_tables: Vec = Vec::new();

for _i in 0..offset_array_len {

let value = match subst_tables_ctxt.read_u32be() {

Ok(val) => val,

Err(_err) => break,

};

offsets_to_subst_tables.push(value);

}

let mut substitution_subtables: Vec<ClassLookupTable<'a>> = Vec::new();

for &offset in offsets_to_subst_tables.iter() {

let subst_subtable = match subtable

.offset(usize::try_from(substitution_subtables_offset)?)

.offset(usize::try_from(offset)?)

.read::<ClassLookupTable<'a>>()

{

Ok(val) => val,

Err(_err) => break,

};

substitution_subtables.push(subst_subtable);

} I don't really like that errors are being swallowed in this code (and other places down further). Perhaps the error matching could be more specific:

match x { Ok(val) => val, Err(ParseError::BadEof) => break, Err(err) => return Err(err) }Additionally I think we might be able to avoid allocating a vector for the offsets:

    let mut substitution_subtables: Vec<ClassLookupTable<'a>> = Vec::new();
    loop {
        let offset = match subst_tables_ctxt.read_u32be() {
            Ok(offset) => offset,
            Err(ParseError::BadEof) => break,
            Err(err) => return Err(err),
        };
        let subst_subtable = match subtable
            .offset(usize::try_from(substitution_subtables_offset)?)
            .offset(usize::try_from(offset)?)
            .read::<ClassLookupTable<'a>>()?;
        substitution_subtables.push(subst_subtable);
    }

In src/morx.rs:

Some(ref b_sch_header) => {

if b_sch_header.unit_size != 6 {

return Err(ParseError::BadValue);

}

let lookup_segments =

ctxt.read_array::(usize::from(b_sch_header.n_units))?;

let lookup_table = LookupTable::Format2 { lookup_segments };

return Ok(ClassLookupTable {

lookup_header,

lookup_table,

});

}

None => return Err(ParseError::BadValue), Optional: This pattern used below as well can be simplified:

⬇️ Suggested change

Some(ref b_sch_header) => {
if b_sch_header.unit_size != 6 {
return Err(ParseError::BadValue);
}
let lookup_segments =
ctxt.read_array::(usize::from(b_sch_header.n_units))?;
let lookup_table = LookupTable::Format2 { lookup_segments };
return Ok(ClassLookupTable {
lookup_header,
lookup_table,
});
}
None => return Err(ParseError::BadValue),
Some(ref b_sch_header) if b_sch_header.unit_size == 6 => {
let lookup_segments =
ctxt.read_array::(usize::from(b_sch_header.n_units))?;
let lookup_table = LookupTable::Format2 { lookup_segments };
return Ok(ClassLookupTable {
lookup_header,
lookup_table,
});
}
_ => return Err(ParseError::BadValue),

In src/morx.rs:

let segment = match ctxt.read::() {

Ok(val) => val,

Err(_err) => break,

};

let offset_to_lookup_values = segment.offset;

if (segment.first_glyph == 0xFFFF) && (segment.last_glyph == 0xFFFF) {

break;

}

let mut read_ctxt = class_table

.offset(usize::from(offset_to_lookup_values))

.ctxt();

let lookup_values = read_ctxt.read_array::(usize::from(

segment.last_glyph - segment.first_glyph + 1, Probably should use checked arithmetic here since these are values originating from the font data.

In src/morx.rs:

2 => match lookup_header.bin_srch_header {

Some(ref b_sch_header) => { Optional: I think a level of nesting can be avoided, which might make the code a bit easier to read by matching on the format and the expected binary search header in one pattern. E.g.

match (lookup_header.format, lookup_header.bin_srch_header.as_ref()) { (0, None) => { … } (1, Some(b_sch_header)) => { … } ⋮ }

In src/morx.rs:

};

return Ok(ClassLookupTable {

lookup_header,

lookup_table,

});

}

_ => return Err(ParseError::BadValue),

},

10 => match lookup_header.bin_srch_header {

None => {

let unit_size = ctxt.read_u16be()?;

let first_glyph = ctxt.read_u16be()?;

let glyph_count = ctxt.read_u16be()?;

match unit_size { Optional: Might be worth extracting this match out to its own function of ReadBinary impl since the outer match is already quite long.

In src/morx.rs:

loop {

let entry = match ctxt.read::() {

Ok(val) => val,

Err(_err) => break,

};

entry_vec.push(entry);

} Optional: This pattern is repeated a number of times I wonder if it would be worth adding a method on ReadCtxt, read_until_end() so it can be reused in each place.

In src/morx.rs:

match lookup_table.lookup_header.format {

0 => {

match &lookup_table.lookup_table { I would expect that lookup_table.lookup_header.format and the lookup table format to match because the ReadBinary impl on ClassLookupTable uses the format from the header to drive which lookup table it tries parse. As a result I think the match on lookup_table.lookup_header.format can be removed and replaced with just a match on lookup_table.lookup_table.

In src/morx.rs:

match lookup_segment

.lookup_values

.read_item(usize::from(glyph - lookup_segment.first_glyph))

{

Ok(val) => return Some(val as u16),

Err(_err) => return None,

} This can be simplified. Also val is already u16 so the cast can be removed. There are other place below where mapping Err to None can also be simplified using ok().

⬇️ Suggested change

match lookup_segment
.lookup_values
.read_item(usize::from(glyph - lookup_segment.first_glyph))
{
Ok(val) => return Some(val as u16),
Err(_err) => return None,
}
lookup_segment
.lookup_values
.read_item(usize::from(glyph - lookup_segment.first_glyph)).ok()

In src/morx.rs:

for lookup_entry in lookup_entries {

if glyph == lookup_entry.glyph {

return Some(lookup_entry.lookup_value);

}

}

return None; //out of bounds Optional: This can be simplified with an iterator:

⬇️ Suggested change

for lookup_entry in lookup_entries {
if glyph == lookup_entry.glyph {
return Some(lookup_entry.lookup_value);
}
}
return None; //out of bounds
lookup_entries
.iter()
.find_map(|entry| (entry.glyph == glyph).then(entry.lookup_value))

In src/morx.rs:

} +}

+//------------------------------ Lookup function -------------------------------------------------------- +pub fn lookup<'a>(glyph: u16, lookup_table: &ClassLookupTable<'a>) -> Option {

if glyph == 0xFFFF {

return Some(0xFFFF);

}

match lookup_table.lookup_header.format {

0 => {

match &lookup_table.lookup_table {

LookupTable::Format0 { lookup_values } => {

return lookup_values.get(usize::from(glyph)).copied();

}

_ => return None, //Only Format0 is valid here. Optional: I think all the places that aren't an early return can remove the use of the return keyword.

In src/morx.rs:

}

UnitSize::TwoByte {

lookup_values: two_byte_values,

} => {

if (glyph >= *first_glyph)

&& (glyph <= (first_glyph + glyph_count - 1))

{

match two_byte_values.read_item(usize::from(glyph - *first_glyph)) {

Ok(val) => return Some(val as u16),

Err(_err) => return None,

}

} else {

return None; //out of bounds

}

}

_ => return None, //Note: ignore 4-byte and 8-byte lookup values for now Perhaps it should be documented in the comment why these aren't handled at the moment.

In src/morx.rs:

match lookup(glyph, class_table) {

None => {

return 1;

}

Some(val) => {

if val == 0xFFFF {

return 2;

} else {

return val;

}

}

} ⬇️ Suggested change

match lookup(glyph, class_table) {

None => {

return 1;

}

Some(val) => {

if val == 0xFFFF {

return 2;

} else {

return val;

}

}

}

match lookup(glyph, class_table) {

Some(0xFFFF) => 2,

Some(val) => val,

None => 1,

}

In src/morx.rs:

if let Some(contxt_entry) = contextual_subtable

.entry_table

.contextual_entries

.get(usize::from(index_to_entry_table))

{

entry = contxt_entry; Typo:

⬇️ Suggested change

if let Some(contxt_entry) = contextual_subtable
.entry_table
.contextual_entries
.get(usize::from(index_to_entry_table))
{
entry = contxt_entry;
if let Some(context_entry) = contextual_subtable
.entry_table
.contextual_entries
.get(usize::from(index_to_entry_table))
{
entry = context_entry;

In src/morx.rs:

[usize::from(entry.mark_index)],

) {

self.glyphs[mark_pos].glyph_index = mark_glyph_subst;

self.glyphs[mark_pos].glyph_origin = GlyphOrigin::Direct; We're directly indexing into things here (using [ ]), which makes me nervous as some of the values appear to come from the font data and this operation will panic if the value is out of bounds. I'd suggest using get when the value cannot be guaranteed to be in bounds.

In src/morx.rs:

mark: None,

}

}

pub fn process_glyphs<'b>(

&mut self,

contextual_subtable: &ContextualSubtable<'b>,

) -> Result<(), ParseError> {

const SET_MARK: u16 = 0x8000;

const DONT_ADVANCE: u16 = 0x4000;

let mut old_glyph: u16;

let mut new_glyph: u16;

//loop through glyphs:

for i in 0..self.glyphs.len() {

let current_glyph: u16 = self.glyphs[i].glyph_index; self.glyphs[i].glyph_index: There are a few places below that are repeating this expression where I think they could use current_glyph instead.

In src/morx.rs:

ligature.variation = val.variation;

}

None => return Err(ParseError::MissingValue),

};

let action: u32 = ligature_subtable.action_table.actions[action_index];

action_index += 1;

let mut offset = action & 0x3FFFFFFF; //take 30 bits

if offset & 0x20000000 != 0 {

offset |= 0xC0000000; //sign-extend it to 32 bits

}

let offset = offset as i32; //convert to signed integer

let index_to_components = glyph_popped as i32 + offset; Suggest avoiding cast with i32::from

In src/morx.rs:

if index_to_components < 0 {

return Err(ParseError::BadValue);

}

let index_to_component_table: usize =

match usize::try_from(index_to_components) {

Ok(index) => index,

Err(_err) => return Err(ParseError::BadValue),

}; You can skip the < 0 check as usize::try_from will return an Err if the value is negative.

In src/morx.rs:

//let ligature_glyph = ligature_subtable.ligature_list.ligature_list

//[usize::from(index_to_ligature)]; Remove commented out code.

In src/morx.rs:

let mut glyph: u16;

let mut subst: u16;

for i in 0..glyphs.len() {

glyph = glyphs[i].glyph_index;

subst = noncontextual_lookup(glyph, &noncontextual_subtable.lookup_table);

if (subst != 0xFFFF) && (subst != glyph) {

glyphs[i].glyph_index = subst;

glyphs[i].glyph_origin = GlyphOrigin::Direct;

}

}

Ok(()) This can be made more idiomatic with an iterator:

⬇️ Suggested change

let mut glyph: u16;
let mut subst: u16;
for i in 0..glyphs.len() {
glyph = glyphs[i].glyph_index;
subst = noncontextual_lookup(glyph, &noncontextual_subtable.lookup_table);
if (subst != 0xFFFF) && (subst != glyph) {
glyphs[i].glyph_index = subst;
glyphs[i].glyph_origin = GlyphOrigin::Direct;
}
}
Ok(())
for glyph in glyphs {
let subst = noncontextual_lookup(glyph.glyph_index, &noncontextual_subtable.lookup_table);
if (subst != 0xFFFF) && (subst != glyph.glyph_index) {
glyph.glyph_index = subst;
glyph.glyph_origin = GlyphOrigin::Direct;
}
}
Ok(())

In src/morx.rs:

small_caps: false,

multi_subst_dup: false,

is_vert_alt: false,

fake_bold: false,

fake_italic: false,

extra_data: (),

variation: None,

};

let mut glyphs: Vec<RawGlyph<()>> = vec![glyph1, glyph2, glyph3, glyph4, glyph5];

let features = Features::Custom(Vec::new());

let _res = apply(&morx_table, &mut glyphs, &features);

//println!("The glyphs array after morx substitutions: {:?}", glyphs); Remove commented out code

In src/morx.rs:

for chain in morx_table.morx_chains.iter() {

for subtable in chain.subtables.iter() {

if subtable.subtable_header.coverage & 0xFF == 2 {

//liga_subtable_no += 1;

//println!("Ligature subtable No: {}", liga_subtable_no);

if let SubtableType::Ligature { ligature_subtable } = &subtable.subtable_body {

liga_subst.next_state = 0;

liga_subst.component_stack.clear();

liga_subst.process_glyphs(ligature_subtable)?;

}

}

}

}

// println!("The glyphs array after ligature substitutions: {:?}", glyphs); I think this test should have assertions that the glyphs array is in the expected state after applying the substitutions.

In src/morx.rs:

}

_ => false,

};

if apply {

subfeature_flags =

(subfeature_flags & entry.disable_flags) | entry.enable_flags;

}

}

}

}

Ok(subfeature_flags) +}

+//------------------------------------ Ligature Substitution Test ---------------------------------------------------------- +pub fn morx_ligature_test<'a>(scope: ReadScope<'a>) -> Result<(), ParseError> { Tests should have test attribute so they are run by cargo test,

⬇️ Suggested change -pub fn morx_ligature_test<'a>(scope: ReadScope<'a>) -> Result<(), ParseError> { +#[test] +pub fn morx_ligature_test<'a>(scope: ReadScope<'a>) -> Result<(), ParseError> {

In src/morx.rs:

!feature_mask.contains(FeatureMask::CLIG)

}

_ => false,

};

if apply {

subfeature_flags =

(subfeature_flags & entry.disable_flags) | entry.enable_flags;

}

}

}

}

Ok(subfeature_flags) +}

+//------------------------------------ Ligature Substitution Test ---------------------------------------------------------- Tests should be in a module guarded by the test attribute so they are only built for the test harness:

[cfg(test)]

mod tests {

// test functions in here

}

In src/morx.rs:

//string: "ptgffigpfl" (for Zapfino.ttf)

//let mut glyphs: Vec = vec![585, 604, 541, 536, 536, 552, 541, 585, 536, 565];

//string: "ptpfgffigpfl" (for Zapfino.ttf)

//let mut glyphs: Vec = vec![585, 604, 585, 536, 541, 536, 536, 552, 541, 585, 536, 565];

//string: "Zapfino" (for Zapfino.ttf)

//let mut glyphs: Vec = vec![104, 504, 585, 536, 552, 573, 580];

//string: "ptgffigpfl" (for Ayuthaya.ttf)

//let mut glyphs: Vec = vec![197, 201, 188, 187, 187, 190, 188, 197, 187, 193];

//string: ""\u{1F468}\u{200D}\u{1F469}\u{200D}\u{1F467}\u{200D}\u{1F467}"" (for emoji.ttf)

//let mut glyphs: Vec = vec![1062, 43, 1164, 43, 1056, 43, 1056];

//string: "U+1F1E6 U+1F1FA" (for emoji.ttf)

//let mut glyphs: Vec = vec![16, 36]; Remove commented out code.

In src/morx.rs:

//print glyph array after applying substitutions.

for glyph in glyphs.iter() {

println!(" {:?}", glyph);

} I think this should also be replaced by assertions that the glyphs are as expected.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

yeslogic / allsorts

Morx #85

}

}

ctxt.check_version(version == 3)

match (lookup_header.format, lookup_header.bin_srch_header.as_ref()) { (0, None) => { … } (1, Some(b_sch_header)) => { … } ⋮ }

[cfg(test)]

}