mgieseki / dvisvgm

A fast DVI, EPS, and PDF to SVG converter
https://dvisvgm.de
GNU General Public License v3.0
295 stars 28 forks source link

Unable to convert Japanese characters to SVG from DVI #266

Closed Jonathan-LeRoux closed 3 months ago

Jonathan-LeRoux commented 3 months ago

I'm having issues when converting DVI to SVG when the DVI contains Japanese text. I can use dvipdfmx to convert DVI to PDF, then dvisvgm to convert the PDF to SVG successfully. However, if I try to use dvisvgm to directly convert from DVI to SVG, I get the following error:

C:\Temp\test_jp_dvi>dvisvgm -o IguanaTex_tmp3-2-2.svg IguanaTex_tmp.dvi
pre-processing DVI file (format version 2)
WARNING: no font file found for 'rml'
processing page 1
  graphic size: 27.692136pt x 6.858624pt (9.732673mm x 2.410531mm)
  WARNING: can't embed font 'rml'
  output written to IguanaTex_tmp3-2-2.svg
1 of 1 page converted in 2.602 seconds

I'm converting the following LaTeX source to DVI using platex:

\documentclass{jsarticle}
\pagestyle{empty}
\begin{document}
    日本語
\end{document}

I'm on Windows 11, using Tex Live 2024 with dvisvgm 3.2.2.

mgieseki commented 3 months ago

The problem is that the default map files don't contain a font mapping for rml (referenced by jis.vf). Since there is no such font available, dvisvgm issues a warning. To fix this, you need to create a map file myfonts.map with the following single line in it:

rml H KozMinPro-Medium.otf

and call dvisvgm with option --fontmap=myfonts.map. Of course, you can choose a different OTF file that contains the required characters.

Jonathan-LeRoux commented 3 months ago

Thanks a lot, that does work.

It's too bad that there is no automatic way to do this, whereas dvipdfmx is able to find the fonts (or do whatever is needed to avoid the issue). As developed of IguanaTex, I use dvisvgm under the hood to convert LaTeX into SVG. Because of https://github.com/mgieseki/dvisvgm/issues/166, I recommend users to do the conversion via DVI, but that can raise this kind of error (while first converting to PDF doesn't).

Is there a fundamental reason why dvipdfmx can avoid the need to know the font map but dvisvgm cannot?

mgieseki commented 3 months ago

TeX Live provides pre-defined map files for dvipdfmx. They are located in the texmf tree somewhere below fonts/map. For dvisvgm there are none so that you have to create them on your own. However, dvisvgm tries to load one of the default map files (also see the dvisvgm manual page) and extracts the required data from it which might not be sufficient, e.g. if the name of the font file is missing there. You could try to add the missing entries there as well.

aminophen commented 3 months ago

Hi - as @mgieseki notes TeX Live provides pre-defined map files for dvipdfmx, which can be also parsed by dvisvgm. Then, how about reading it on the dvisvgm program itself? The map file for dvipdfmx is named "kanjix.map" generated via updmap system, and its contents (= mapping to actual OTF/TTF fonts) can be switched via kanji-config-updmap command.

Jonathan-LeRoux commented 3 months ago

Hi @aminophen , that sounds like a great suggestion. Is this something that should be done on the Tex Live side, on the dvisvgm side? By the way, does this issue also affect MikTex?

mgieseki commented 3 months ago

I could extend the map file mechanism e.g. by adding a dedicated dvisvgm.map to the default map files. If it's present, it takes the mapping data from there. In order to simplify things, I'd also provide an include statement so that other map files, like the ones from dvipdfmx could be integrated easily. However, dvisvgm.map then needed to be maintained and configured by the maintainers of the corresponding TeX environment. Alternatively, the users could put it in their local folders and adapt it to their needs. It's probably not a good idea to look for all the dvipdfmx-specific map files directly. That would be a too strong dependency on a different utility with different requirements.

Jonathan-LeRoux commented 3 months ago

Having the maintainers configure this as I assume they do for dvipdfmx sounds reasonable to me, but this is way beyond my level so I'll let you decide the best course of action.
One suggestion I would have, if eventually this can't be done automatically and the user needs to define the map, is to give some guidance in the error message when a font isn't found.

mgieseki commented 3 months ago

One suggestion I would have, if eventually this can't be done automatically and the user needs to define the map, is to give some guidance in the error message when a font isn't found.

Agreed. I understand that the warning messages are a bit unspecific. But, unfortunately, it's not easy to create proper messages giving useful information about the problem. The entire font mechanism required to resolve font and character information from DVI files is pretty complex. There are many different external files involved, like .vf, .map, .sfd, .otf, cmap files etc. All of them might or might not provide information to finally successfully load the font file as well as its character encoding. If the collected data is not sufficient at the end of the processing chain, it's hard to track down what's actually missing and where it should be located. At the moment, I don't see an easy way to produce more helpful warnings.

Jonathan-LeRoux commented 3 months ago

I completely understand that it's hard to automatically provide exact guidance for every case. I'm just wondering how someone like me could have figured out that a font mapping for rml was missing, that it should be defined in a font map, and how to find which font it should map to, without bugging you in a Github issue :) So any error message that would give hints to the user as to how to search for a solution would be helpful.

mgieseki commented 3 months ago

Unfortunately, there's no simple answer to it because it depends on the actual failure. In your case I looked into the DVI file (using dviasm) and found a reference to font jis:

[font definitions]
fntdef: jis (10pt) at 9.609985pt

The corresponding (virtual) font file jis.vf was found. Otherwise, dvisvgm would have shown a warning. So it takes the character data from there. Characters in VF files consist of DVI fragments that can refer to other fonts. jis.vf does this and refers to font rml:

(MAPFONT D 1
   (FONTNAME rml)
   (FONTCHECKSUM O 0)
   (FONTAT R 0.962216)
   (FONTDSIZE R 10.0)
   )

Since dvisvgm can't find this font or further information on it, it prints the warning. Then I grepped the dvipdfmx map files and found the following in cid-x.map:

%% Ryumin and GothicBBB found in PostScript printers:
rml  H Ryumin-Light
gbm  H GothicBBB-Medium
rmlv V Ryumin-Light
gbmv V GothicBBB-Medium

I don't have a font Ryumin-Light installed, so I just took KozMinPro-Medium.otf (Kozuka Gothic Pro) to check if it works and proposed the map file mentioned above. This mapping information that links rml to an actual font file and determines the character mapping (here: reference to CMap file H) must be given somewhere. There's currently no way dvisvgm could have resolved this itself. It also can't determine if a map file is required or if just the font file rml.otf, rml.pfb, rml.gf or similar is missing. In the end, it's probably a task for the maintainers of the TeX distribution to provide all data required for dvisvgm to work correctly with the offered fonts as they do for dvips and dvipdfm(x).

Jonathan-LeRoux commented 3 months ago

Thanks a lot for the explanation! It does sound like this should be taken upon by the TeX distribution maintainers. Do you or @aminophen know how to proceed to make the maintainers of the main TeX distributions aware of this? I see that dvipdfmx is actually now maintained as part of Tex Live, which explains maybe the tighter integration (does MikTex need to handle this differently?). I'm not sure how that would play out with dvisvgm. Please feel free to close this bug as I think we covered what needed to be covered, but if this ever gets picked up by TeX Live, I'd be happy to know.

mgieseki commented 3 months ago

I see that dvipdfmx is actually now maintained as part of Tex Live, which explains maybe the tighter integration (does MikTex need to handle this differently?)

I think dvipdfmx is actively maintained by the TL team because XeTeX requires its variant xdvipdfmx to create PDF files from the default XDV output. So it's kind of an integral part of the TeX environment. dvisvgm is more of an independent, optional utility.

MiKTeX also comes with pre-defined map files for dvipdfmx as part of the dvipdfmx package. They are identical to those in TL and are also located in the same directories of the texmf tree.

I've added a couple of changes to the code handling font maps. dvisvgm now looks for dvisvgm.map in the current working directory as well as in the texmf tree. If it's found, the mapping data is read from there. #include statements can be uses to load the contents of other map files so that it's not necessary to duplicate stuff already present in other map files. However, in order to avoid users having to deal with the configuration themselves, the TeX system maintainers would have to provide it as part of the distribution.

Since TL updates the binaries, like dvisvgm, only once a year, the changes will not be available until next year. Maybe that's enough time to find a way how to provide and maintain the file.

Jonathan-LeRoux commented 3 months ago

Thanks a lot, this all sounds good to me.