skishore / makemeahanzi

Free, open-source Chinese character data
https://www.skishore.me/makemeahanzi/
Other
1.83k stars 465 forks source link

Extrapolate stroke caps for overlapping strokes #28

Closed chanind closed 6 years ago

chanind commented 6 years ago

Thank you for making this incredible library and open-sourcing it! One possible point of improvement:

Make Me a Hanzi currently clips the end of strokes if it's obscured by another overlapping stroke. It makes sense because this data was extracted from a real font, but this isn't ideal for a few reasons:

I think it should be possible to add a realistic stroke end to clipped strokes that won't change the way the character looks after all strokes are drawn, but should look natural when the stroke is viewed in isolation and fix the issues brought up above. Maybe it could work by trying to fit a stroke end from other similar-looking strokes such that the fitted end is fully obscured by the stroke drawn on top.

hugolpz commented 6 years ago

@chanind, do you have a script/programmatical approach in mind or do you consider SVG editing by hand ? Just to know.

chanind commented 6 years ago

I think it should be possible to do programmatically. Maybe something like for every stroke that's clipped, search through all the other characters in the dataset for the closest looking stroke and try to use the end from that stroke? I'll experiment with this if I have some time.

hugolpz commented 6 years ago

CDL has a cascading system between files. It maybe the way to go somedays. The transformation bounding-box's of the component is the current bounding box of its strokes.

skishore commented 6 years ago

This would be a nice enhancement, and I thought about algorithms for it a while back.

One approach here is to use more data from the "bridges" data structure that is the key to breaking the original glyph down into stroke components. I've drawn some of the bridges for the example character from above in this diagram:

image

Using some geometry, every time a stroke boundary hits a bridge, we could automatically create two quadratic Bezier curve that smoothly interpolates between the stroke's angle on the two sides of that bridge. For simple bridges which are collinear with the stroke itself, this interpolation would just be a line, but for the bridge up near the top-left of that character, it would come close to a point as you'd expect. There is a third case for the diagonal strokes in 木, where the two angles actually spread apart, but I think the same math would cover all three.

The geometry here is a bit finicky to get right, which is why I never got around to doing it. But the point is that I think this piece is doable without manually drawing any curves and without needing to use decompositions.

parsimonhi commented 6 years ago

There are some special cases such as the 6th stroke of 者.

parsimonhi commented 6 years ago

I made a try with a simple algorithm that replaces every straight line by a cubic Bezier curve.

Below is the result for 我: 25105brush

I put online a demo at http://gooo.free.fr/animCJK/all.php. Select the "brush" checkbox, 512 or 1024px radio input and Hsk hanzi (China) radio input. Then enter a hanzi in the data field or select a hanzi in the list on the bottom of the page.

The result is not always perfect (mostly because the svg data have some defects such as very short curves) but seems acceptable most of the time.

parsimonhi commented 6 years ago

If you want to test the algorithm using the makeMeAHanzi data, just get the javascript strokeBrushing() function from the code of http://gooo.free.fr/animCJK/all.php and apply it to makeMeAHanzi stroke paths just before displaying them. This function adds to a makeMeAHanzi stroke path brush-like start and end.

chanind commented 6 years ago

It looks like you beat me to it! I made a similar attempt here: https://github.com/skishore/makemeahanzi/pull/32. I suspect we're doing almost the same thing

parsimonhi commented 6 years ago

chanind: It looks like you beat me to it! I made a similar attempt here: #32. I suspect we're doing almost the same thing

:-)

Yes, it's similar. But my solution is not perfect as is, and requires some improvements (especially when there are "bridges" with disturbing points around it). So any other solution may help!

chanind commented 6 years ago

Yes, it's similar. But my solution is not perfect as is, and requires some improvements (especially when there are "bridges" with disturbing points around it). So any other solution may help!

I noticed the same thing - whenever there's a bridge there's likely to be a tiny distorted curve, typically less than 2px in length, that throws off all the tangent calculations. I tried to get around that by calculating the tangent using getPointAtLength() and looking a few pixels back from the clip points. That way it doesn't matter if there's a tiny distorted path curve in the pathstring.

skishore commented 6 years ago

I've just about gotten your work integrated into the tool, @chanind - once it's in, we'll have the corrections applied to the SVGs too, and future runs will be incremental and fast. The server data migration is taking an hour on my machine though!

skishore commented 6 years ago

And done as of e0089f72b0010413bae0e854cde1c1ceac04f8f8!

I'll probably run the stroke-caps logic one more time to get those last few stragglers, and then from now on, the tool will run it twice (heh) on every character whenever it's updated.

chanind commented 6 years ago

Thanks for your work getting this merged into the tools branch! It looks great!