skishore / makemeahanzi

Free, open-source Chinese character data
https://www.skishore.me/makemeahanzi/
Other
1.91k stars 472 forks source link

Can I Convert SVG path data to medians #15

Closed TomClarkson closed 7 years ago

TomClarkson commented 7 years ago

I want to add new characters like Hirigana to my project and I have SVG path data for them.

Can I generate the medians from SVG data? If so, how?

Say if I have the character 一

With Path Data "M 518 382 Q 572 385 623 389 Q 758 399 900 383 Q 928 379 935 390 Q 944 405 930 419 Q 896 452 845 475 Q 829 482 798 473 Q 723 460 480 434 Q 180 409 137 408 Q 130 408 124 408 Q 108 408 106 395 Q 105 380 127 363 Q 146 348 183 334 Q 195 330 216 338 Q 232 344 306 354 Q 400 373 518 382 Z"

How do you get

"medians":[ [ [121,393], [193,372], [417,402], [827,434], [920,401] ] ]

Many Thanks, and Thanks for this project!

skishore commented 7 years ago

It seems that you've already separated your character into strokes - at least, that path you linked looks like a single stroke. Is that right? If so, please take a look at:

https://github.com/skishore/makemeahanzi/blob/tool/lib/median_util.js

In that file, findStrokeMedian takes an SVG path and returns an approximate median (with many points on it), and normalizeForMatch uses the simplify library to reduce that median to an a further approximation with fewer points. I hope that helps!

The tool code isn't licensed yet, mainly because I don't think it's in a great state to be used by other people, but at least those libraries are decent. If you find that that code works for your case then I'll add an MIT license.

TomClarkson commented 7 years ago

Thanks very much for your reply. I extracted the findStrokeMedian code and it perfectly with the data above. However, I was hoping to use some hirigana stroke data so I could use it with hanzi-writer but the path data for hirigana strokes has c commands in the path data which isn't accepted.

screen shot 2017-06-16 at 9 27 26 am

Is there any way I could get hirigana stroke data in the makemeahanzi format?

https://github.com/TomClarkson/makemeahanzidata/blob/master/src/index.js#L26

skishore commented 7 years ago

I've written some hacks that are partially working for rendering the cubic elements. They're not at a level at which I'd want to commit them to the branch itself, but they may be usable for your case. Be warned that most methods in svg.js still don't handle cubics well; I just got convertSVGPathToPaths and computePolygonApproximation sort of working: https://github.com/skishore/makemeahanzi/commits/cubic

However, that path that you linked doesn't seem to be a proper stroke. It doesn't loop back on itself, for starters: https://codepen.io/anon/pen/eRBKoN Where did you get that data from?

TomClarkson commented 7 years ago

Hi I got the stroke data from kanjivg I thought if I could get the stroke from the one character to work then I could use hirigana and katakana data from KanjiVG. Here is the data for the one character https://github.com/KanjiVG/kanjivg/blob/master/kanji/04e00.svg. I guess the data is too different to be used because KanjiVG data is just straight lines. (Sorry I don't know much about SVG).

Could the hirigana data be extracted from the Arphic font? I'm not sure how to do this. If not or it is difficult please close this issue. Thanks again.

skishore commented 7 years ago

Gotcha. The Arphic font only contains hanzi/kanji, so no hiragana there. However, the KanjiVG data is basically already in the "median" form - the strokes that the provide are lines, not outlines.

Basically, for the KanjiVG data, if you take those changes I made, comment out the "Path has open contour" assertion (which only applies to outlined strokes), and then call "GetPolygonApproximation", you'll have a median already.

TomClarkson commented 7 years ago

I'm not sure to call them GetPolygonApproximation from sorry, it is over my head.

https://github.com/TomClarkson/makemeahanzidata/blob/master/src/makemeahanzi/svg.js#L91