Open Kurinosuke118 opened 6 months ago
Thanks for the tip! Do you happen to know if Japan1-7 is a superset of Japan1? In other words, that we are fully backwards compatible if we switch from Japan1 to Japan1-7?
@pietermarsman Thanks for the reply!
Do you happen to know if Japan1-7 is a superset of Japan1?
I think that Adobe-Japan1-7/cid2code.txt
is a superset of cid2code_Adobe_Japan1.txt
.
When I rewrite the contents of the cid2code_Adobe_Japan1.txt
to Adobe-Japan1-7/cid2code.txt
, need to create a *.pickle.gz
file? Please tell me how to create the *.pickle.gz
file, because I'll try this.
I'm not an expert on this myself, the files were already there when I first ran into pdfminer. But I can give you some directions:
Bug report
pdfminer.six
fails to read certain japanese fonts and returnscid
value (cid:xxx). I think this is caused by pdfminer'sCMap
not being able to convert the cid code of a particular Japanese font to a character code.To Reproduce This bug occurs when the following Japanese PDF file is parsed with the following code.
PDF file https://www.maff.go.jp/j/tokei/kouhyou/naisui_gyosei/attach/pdf/index-15.pdf
Code
Ouput
Comment
I think that the file contents in [1] need to update with the file contents in [2] and update a
CMap
withtools/conv_cmap.py
in thepdfminer.six
repository.[1] https://github.com/pdfminer/pdfminer.six/blob/master/cmaprsrc/cid2code_Adobe_Japan1.txt [2] https://github.com/adobe-type-tools/cmap-resources/blob/master/Adobe-Japan1-7/cid2code.txt