This is relevant specifically to grc. Because modern books of Ancient Greek often has to mark out uncertain letters in ancient sources, letters with dot below are a common occurrence but are at present not recognised by tesseract.
I wonder if recognising dot below shouldn’t be a feature behind a flag to be manually turned on because it might also pick up stains in older books (which however tend not to have such dots & so don’t require this feature). But this could make it difficult to deploy the feature in downstream projects like Internet Archive.
This is relevant specifically to grc. Because modern books of Ancient Greek often has to mark out uncertain letters in ancient sources, letters with dot below are a common occurrence but are at present not recognised by tesseract.
A fairly complete list of letters with dot below (except for the lunate sigma ϲ̣) can be found here: https://titus.uni-frankfurt.de/unicode/unicsel/grkkadd.htm
I wonder if recognising dot below shouldn’t be a feature behind a flag to be manually turned on because it might also pick up stains in older books (which however tend not to have such dots & so don’t require this feature). But this could make it difficult to deploy the feature in downstream projects like Internet Archive.