Closed luke-c closed 6 years ago
This has been fixed in the source XML files by the author as of today, the xref target of those two is now the non-dotted version of the target entry.
A comment has also been added to the DTD for contributors saying not to use a target keb/reb with a nakaguro in it.
The only change your side is to now regenerate the JSON files
Whilst playing around parsing the xref field in my own parser I noticed that there is a problem with the xref field in the original XML file.
The JIS centre-dot '・' is used to separate components of the xref but some reb contain that centre dot, so you get xrefs like:
<xref>ブロードノーズ・セブンギル・シャーク</xref>
<xref>イエローテール・スターリー・ラビットフィッシュ</xref>
From my short investigations it seems like it is only these two xrefs which have this problem.
Parsing these by splitting on the centre-dot will get you a list of 3 strings but it actually should only be a list of a single string.
I have contacted Jim Breen the author of JMdict, but in the meantime the solution is to just hard-code a check for these two xrefs and return it as is instead of splitting them by centre dot, as they both relate to a single reb.