texjporg / cjk-gs-support

Scripts to ease the use of CJK fonts with Ghostscript
32 stars 10 forks source link

Distinguish uppercase and lowercase #9

Closed aminophen closed 6 years ago

aminophen commented 8 years ago

Current cjk-gs-integrate.pl doesn't distinguish uppercase/lowercase file names.

I think kpathsea on Unix-like environment cares about uppercase/lowercase. On MacOSX:

$ kpsewhich batang.ttf
/usr/local/texlive/2016dev/texmf-dist/fonts/truetype/public/baekmuk/batang.ttf
$ kpsewhich Batang.ttf
/Library/Fonts/Microsoft/Batang.ttf

However, cjk-gs-integrate.pl doesn't distinguish these two.

Some test cases on MacOSX with MSOffice. Consider following example;

Prepare "database.txt" as follows:

# baekmuk package (free)
Name: Baekmuk-Batang
Class: Korea
Provides(41): HYSMyeongJo-Medium
Filename(20): batang.ttf
Filename(10): Baekmuk-Batang.ttf
# Microsoft Mac Office fonts
Name: Batang
Class: Korea
Provides(40): HYSMyeongJo-Medium
Filename(50): Batang.ttf

and run

$ sudo perl cjk-gs-integrate.pl --debug --debug --force --link-texmf --fontdef=/path/to/database.txt 2>database-test.log

The debug log says

cjk-gs-integrate [DEBUG]: filename: batang.ttf
cjk-gs-integrate [DEBUG]: type: ttf
cjk-gs-integrate [DEBUG]: filename: Baekmuk-Batang.ttf
cjk-gs-integrate [DEBUG]: type: ttf
cjk-gs-integrate [DEBUG]: Dumping fontfiles for Baekmuk-Batang: $VAR1 = {
  'batang.ttf' => {
    'priority' => '20',
    'type' => 'TTF'
  },
  'Baekmuk-Batang.ttf' => {
    'type' => 'TTF',
    'priority' => '10'
  }
};
cjk-gs-integrate [DEBUG]: filename: Batang.ttf
cjk-gs-integrate [DEBUG]: type: ttf
cjk-gs-integrate [DEBUG]: Dumping fontfiles for Batang: $VAR1 = {
  'Batang.ttf' => {
    'type' => 'TTF',
    'priority' => '50'
  }
};
cjk-gs-integrate [DEBUG]: checking for kpsewhich  "batang.ttf"  "Baekmuk-Batang.ttf"  "Batang.ttf" 
cjk-gs-integrate [DEBUG]: Found files /Library/Fonts/Microsoft/batang.ttf /Library/Fonts/Microsoft/Batang.ttf
cjk-gs-integrate [DEBUG]: dumping font database before file check:
cjk-gs-integrate [DEBUG]: $VAR1 = {
  'Baekmuk-Batang' => {
    'class' => 'Korea',
    'provides' => {
      'HYSMyeongJo-Medium' => '41'
    },
    'ttfname' => 'Baekmuk-Batang.ttf',
    'files' => {
      'batang.ttf' => {
        'priority' => '20',
        'type' => 'TTF'
      },
      'Baekmuk-Batang.ttf' => {
        'type' => 'TTF',
        'priority' => '10'
      }
    }
  },
  'Batang' => {
    'ttfname' => 'Batang.ttf',
    'provides' => {
      'HYSMyeongJo-Medium' => '40'
    },
    'class' => 'Korea',
    'files' => {
      'Batang.ttf' => {
        'type' => 'TTF',
        'priority' => '50'
      }
    }
  }
};
cjk-gs-integrate [DEBUG]: dumping basename to filename list:
cjk-gs-integrate [DEBUG]: $VAR1 = {
  'Batang.ttf' => '/Library/Fonts/Microsoft/Batang.ttf',
  'batang.ttf' => '/Library/Fonts/Microsoft/batang.ttf'
};
cjk-gs-integrate [DEBUG]: dumping font database:
cjk-gs-integrate [DEBUG]: $VAR1 = {
  'Baekmuk-Batang' => {
    'files' => {
      'batang.ttf' => {
        'type' => 'TTF',
        'target' => '/Library/Fonts/Microsoft/batang.ttf',
        'priority' => 20
      }
    },
    'type' => 'TTF',
    'available' => 1,
    'subfont' => 0,
    'target' => '/Library/Fonts/Microsoft/batang.ttf',
    'class' => 'Korea',
    'provides' => {
      'HYSMyeongJo-Medium' => '41'
    },
    'ttfname' => 'Baekmuk-Batang.ttf'
  },
  'Batang' => {
    'type' => 'TTF',
    'files' => {
      'Batang.ttf' => {
        'priority' => 50,
        'type' => 'TTF',
        'target' => '/Library/Fonts/Microsoft/Batang.ttf'
      }
    },
    'target' => '/Library/Fonts/Microsoft/Batang.ttf',
    'provides' => {
      'HYSMyeongJo-Medium' => '40'
    },
    'class' => 'Korea',
    'ttfname' => 'Batang.ttf',
    'available' => 1,
    'subfont' => 0
  }
};
cjk-gs-integrate [DEBUG]: dumping aliases:
cjk-gs-integrate [DEBUG]: $VAR1 = {
  'HYSMyeongJo-Medium' => {
    '40' => 'Batang',
    '41' => 'Baekmuk-Batang'
  }
};

Therefore, both symlinks "batang.ttf" and "Baekmuk-Batang.ttf" points to /Library/Fonts/Microsoft/Batang.ttf, which is an unexpected result.

aminophen commented 8 years ago

I noticed an interesting behavior of kpathsea.

I have two font files:

First, I set OSFONTDIR

$ export OSFONTDIR=/Library/Fonts:/Library/Fonts/Microsoft:/System/Library/Fonts

when TTFONTS is empty, uppercase and lowercase can be distinguished:

$ export TTFONTS=
$ kpsewhich -var-value=TTFONTS
.:{/Users/Hironobu/.texlive2016/texmf-config,/Users/Hironobu/.texlive2016/texmf-var,/Users/Hironobu/texmf,!!/usr/local/texlive/2016dev/texmf-config,!!/usr/local/texlive/2016dev/texmf-var,!!/usr/local/texlive/texmf-local,!!/usr/local/texlive/2016dev/texmf-dist}/fonts/{truetype,opentype}//:/Library/Fonts:/Library/Fonts/Microsoft:/System/Library/Fonts//
$ kpsewhich Batang.ttf
/Library/Fonts/Microsoft/Batang.ttf
$ kpsewhich batang.ttf
/usr/local/texlive/2016dev/texmf-dist/fonts/truetype/public/baekmuk/batang.ttf
$ kpsewhich -all Batang.ttf
/Library/Fonts/Microsoft/Batang.ttf
$ kpsewhich -all batang.ttf
/usr/local/texlive/2016dev/texmf-dist/fonts/truetype/public/baekmuk/batang.ttf
/Library/Fonts/Microsoft/batang.ttf

when TTFONTS is non-empty, kpathsea cannot find lowercase one:

$ export TTFONTS=/Library/Fonts:/Library/Fonts/Microsoft:/System/Library/Fonts:`kpsewhich -var-value=TTFONTS`
$ kpsewhich -var-value=TTFONTS
/Library/Fonts:/Library/Fonts/Microsoft:/System/Librart/Fonts:.:{/Users/Hironobu/.texlive2016/texmf-config,/Users/Hironobu/.texlive2016/texmf-var,/Users/Hironobu/texmf,!!/usr/local/texlive/2016dev/texmf-config,!!/usr/local/texlive/2016dev/texmf-var,!!/usr/local/texlive/texmf-local,!!/usr/local/texlive/2016dev/texmf-dist}/fonts/{truetype,opentype}//:/Library/Fonts:/Library/Fonts/Microsoft:/System/Library/Fonts//
$ kpsewhich Batang.ttf
/Library/Fonts/Microsoft/Batang.ttf
$ kpsewhich batang.ttf
/Library/Fonts/Microsoft/batang.ttf   <= ???
$ kpsewhich -all Batang.ttf
/Library/Fonts/Microsoft/Batang.ttf
$ kpsewhich -all batang.ttf
/usr/local/texlive/2016dev/texmf-dist/fonts/truetype/public/baekmuk/batang.ttf
/Library/Fonts/Microsoft/batang.ttf

When we use OPENTYPEFONTS/TTFONTS explicitly, it seems that kpathsea cannot distinguish uppercase/lowercase letters. I propose a workaround (not setting these two values explicitly; see aminophen@7d0f926). By introducing this, we will catch links created in TEXMFLOCAL before actual font files more frequently, but it will not harm as

    my $realf = abs_path($f);

checks whether the found file is a mere symlink or real font.

norbusan commented 7 years ago

Coming back to this issue since I was reading through your code. I don't see this behaviour concerning upper and lower case here on Unix.

The explanation is simple as far as I remember: OSX's default HFS is case insensitive, upper and lower case are ignored. Thus, filename comparison operators are also case insensitive.

kpsewhich is doing the right thing, only that on Macs with case-insensitive (like Windows, and that is the default).

Maybe we should think about a better solution.

aminophen commented 7 years ago

kpsewhich is doing the right thing, only that on Macs with case-insensitive (like Windows, and that is the default).

OK, thanks. Currently I don't see any better solution, so I'll let you know when I come up with one.

aminophen commented 7 years ago

How about invoking otfinfo (shipped with TeX Live), only when "we know" that

Actually I don't know why otfinfo can get PSName info from TTF file (though TTC/OTC fails), but it may work as expected. e.g.

$ otfinfo -p /usr/local/texlive/2016/texmf-dist/fonts/truetype/public/baekmuk/batang.ttf
Baekmuk-Batang
$ otfinfo -p /Library/Fonts/Microsoft/Batang.ttf
Batang

Currently such "known" files are only batang.ttf and gulim.ttf, so it will not cost so much. If otfinfo returns the same PSName as the Name:, then we take it; otherwise we should discard it and test another file.

aminophen commented 7 years ago

invoking otfinfo

My first attempt: d16db33 ('otfinfo' branch at texjporg/cjk-gs-support)

The code assumes otfinfo -p is available, so there may be some problems on systems which do not provide otfinfo.

norbusan commented 7 years ago

I am not sure how this would fix the problem in case both files provide the same correct PSName? Isn't there a chance that this can happen?

aminophen commented 7 years ago

kpsewhich on case-insentive system may return

$ kpsewhich -all batang.ttf 
/Library/Fonts/Microsoft/batang.ttf
/usr/local/texlive/2016/texmf-dist/fonts/truetype/public/baekmuk/batang.ttf

but "/Library/Fonts/Microsoft/batang.ttf" is not what we expect (actually it's uppercase one). This is the wrong catch, but can be distinguished by otfinfo.

$ otfinfo -p /Library/Fonts/Microsoft/batang.ttf
Batang

We know from database that batang.ttf (baekmuk free font) should be "Baekmuk-Batang" in PSName, so we can discard "/Library/Fonts/Microsoft/batang.ttf" not to be chosen as batang.ttf.

norbusan commented 7 years ago

Yes, for Batang I see this. But what happens if there is another file Foobar.ttf that appears two or more times, and all of them have the same otfinfo -p name? (This is not the case now, but who knows?)

aminophen commented 7 years ago

When two fonts shared both filename and the otfinfo -p name, then we'd assume that they are completely identical. We don't need to distinguish them, only one is needed, as ghostscript will never use both at the same time.

When two or more TTF fonts "have the same otfinfo -p name", then they should be registered under the same Name: entry using different priority number. So, "Foobar.ttf that appears two or more times, and all of them have the same otfinfo -p name" will never happen.

aminophen commented 6 years ago

Kpathsea 6.3.0 by default falls back to case-insensitive search, if there is no match for case-sensitive search; we have to implement "PSName checker" discussed here for safety.

5.4 Casefolding search

In Kpathsea version 6.3.0 (released with TeX Live 2018), a new fallback search was implemented on Unix-like systems, including Macs: for each path element in turn, if no match is found by the normal search, and the path element allows for checking the filesystem, a second check is made for a case-insensitive match.

This is enabled at compile-time on Unix systems, and enabled at runtime by setting the configuration variable 'texmf_casefold_search', to a true value, e.g., '1'; this is done by default in TeX Live.

aminophen commented 6 years ago

My idea:

  1. First, check that otfinfo command is available.
    • If not, disable all of the followings and throw a warning like "The program 'otfinfo' not found in PATH. Sorry, we can't be safe enough to distinguish uppercase / lowercase file names."
  2. Set Casefold: true flag in the database, for name entries for which we already know that casefolding matters.
  3. Call otfinfo -p command only when both of the following conditions are met: [1] the file type is OTF or TTF [2] the Casefold: true flag is set.
  4. Disregard the kpathsea result when PSName returned by otfinfo -p is different from one available from the database.
aminophen commented 6 years ago

Done in ca1bba1e74...9e630e3047

norbusan commented 6 years ago

Looks fine, including the patches. I'm not overly happy to have otfinfo as a semi-necessity, but I guess that is the prize to pay for an automated system ;-) Thanks for the commits.