taviso / wpunix

WordPerfect for UNIX Character Terminals
603 stars 17 forks source link

Greek characters mostly working, apart from two #29

Closed krackout closed 1 year ago

krackout commented 1 year ago

Contrary to an issue regarding cyrillic, greek characters can be typed, and shown on console, on preview (sixel) and on print (ghostscript). All but two characters, "Α" (Greek Capital Letter Alpha U+0391) and "ς" (Greek Small Letter Final Sigma U+03C2) .

By the way, amazed by your research and work on 1-2-3, UNIX & MS-DOS and WP!

taviso commented 1 year ago

Very Interesting! I can confirm the problem, but the issue isn't obvious, the character definitions seem to be correct at first look... I'll step through the code and see where it's going wrong.

taviso commented 1 year ago

Just some notes to myself:

It seems like mapkey() does not correctly handle extended characters when the least significant octet of the scancode happens to be zero, and U+0391 just randomly happens to be 0xf800.

If I manually skip that octet check at mapkey+12 with a debugger, it does seem to work correctly! I think I will need to re-implement mapkey(), which doesn't seem too complicated.... (famous last words).

taviso commented 1 year ago

It seems like the terminal sigma issue is different from the capital alpha issue - that is just plain wrong character we inherited from libwpd! I think it's "stigma" which is a ligature for sigma+tau that just kinda looks like terminal sigma. That one is an easy fix.

Is the uppercase terminal sigma correct? it looks like this: Ϲ

emendelson commented 1 year ago

This is amazing detective work. Am I right in thinking that we're talking about WP character 8,39? If so, then it seems that libwpd has that character right (and also 8,38) in the WP6 character set, which begins like this:

0x0391, 0x03b1, 0x0392, 0x03b2, 0x0392, 0x03d0, 0x0393, 0x03b3,
0x0394, 0x03b4, 0x0395, 0x03b5, 0x0396, 0x03b6, 0x0397, 0x03b7,
0x0398, 0x03b8, 0x0399, 0x03b9, 0x039a, 0x03ba, 0x039b, 0x03bb,
0x039c, 0x03bc, 0x039d, 0x03bd, 0x039e, 0x03be, 0x039f, 0x03bf,
0x03a0, 0x03c0, 0x03a1, 0x03c1, 0x03a3, 0x03c3, 0x03a3, 0x03c2,

The WP5 character set seems to have the wrong characters in the last two positions on the fifth line:

0x0391, 0x03b1, 0x0392, 0x03b2, 0x0392, 0x03d0, 0x0393, 0x03b3,
0x0394, 0x03b4, 0x0395, 0x03b5, 0x0396, 0x03b6, 0x0397, 0x03b7,
0x0398, 0x03b8, 0x0399, 0x03b9, 0x039a, 0x03ba, 0x039b, 0x03bb,
0x039c, 0x03bc, 0x039d, 0x03bd, 0x039e, 0x03be, 0x039f, 0x03bf,
0x03a0, 0x03c0, 0x03a1, 0x03c1, 0x03a3, 0x03c3, 0x03f9, 0x03db,

Am I reading this correctly, or am I totally confused?

taviso commented 1 year ago

Yep, it is 8,39 and you're reading that correctly! If you look at the characters it's easy to confuse them, so it's an understandable error: ϛς

emendelson commented 1 year ago

I've filed a ticket at the libwpd SourceForge page, and will alert the author of vDos.exe which has special features for exchanging data with the Windows clipboard.

Shouldn't 8,38 also be changed? The character is defined in CHARACTR.DOC as SIGMA (terminal), but there does not seem to be an uppercase SIGMA used at the end of a word. WP5 has the obviously wrong Ϲ "Greek Capital Lunate Sigma" while WP6 has Σ, the same uppercase Sigma at 8,36, which seems to be the correct way to do this.

taviso commented 1 year ago

That's what I meant above, I wasn't sure!

Now that I look into it, the uppercase form of stigma is this lunate form, so whoever prepared the table probably just looked up the uppercase form of the wrong character. It seems like Σ is correct.

emendelson commented 1 year ago

That's what I meant above, I wasn't sure!

Ah - I was replying too quickly to remember that you mentioned that character!

taviso commented 1 year ago

I went ahead and made that change. I'll try to rewrite the mapkey() function, I think this bug might prevent directly entering the first character of some sets from the keyboard - do you know if that works in DOS?

For example, without using compose and just using a localized keyboard layout, I think these characters are impossible to enter in UNIX:

Hebrew Aleph (א) Greek Alpha (Α) Russian A (А) Japanese phonetic a (ぁ)

emendelson commented 1 year ago

I think DOS handles the keyboard very differently. Under DOS, the keyboard layout currently loaded with the KEYB command (US, GK for Greek, etc.) translates the physical key that you type into a number that corresponds to one of the characters in the current 256-character code page. WordPerfect switches to its own internal table of characters, which matches the number from 1 to 256 that it gets from keyboard to a WP character. (And in order to display the Greek alphabet, you need to load a .CPI file that switches the text display so that Greek letters appear instead of the Roman alphabet.) For example:

https://hwiegman.home.xs4all.nl/msdos/113841.htm

I haven't done all of this for years, but I just now tried (using DOSBox-X) the Greek keyboard and code page and had no trouble typing any character. Unix makes everything a lot less complicated...!

krackout commented 1 year ago

You seem to be on the right path, I'll nevertheless clarify on ς-Σ-σ: "ς" (Greek Small Letter Final Sigma U+03C2)'s capital letter is "Σ" (Greek Capital Letter Sigma U+03A3). There is no distinct Greek Capital Letter Final Sigma. So essentially both "σ" (Greek Small Letter Sigma U+03C3) and "ς" (U+03C2) have "Σ" (U+03A3) as their capital letter.

In practice, if you press either shift+s or shift+w on a greek keyboard you get "Σ". Without shift, if you press s you get "σ", if you press w you get "ς".

"ϛ" (Greek Small Letter Stigma U+03DB) is not used in modern greek, not even in classical ancient greek if I recall well; only in archaic forms and for numbering. Actually I just found that I can type Stigma in wpunix using AltGr+w combination! But it's not printed.

krackout commented 1 year ago

I copied xterm.trs from a893dec to /opt/wp80/shlib10/, it works great! pressing w gives "ς" (Greek Small Letter Final Sigma U+03C2), shift+w gives "Σ" (Greek Capital Letter Sigma U+03A3).

Regarding printing (ghostscript): Although "ς" is shown on preview, it seems to not be printed - not shown on PS output. Yet, reading PS source I can see sigma1 which seems to be equivalent to sigmafinal. Also, to be more perplexed, converting PS to PDF, still "ς" not shown, but after converting PDF to plain text, "ς" appeared!

I suppose libwpd is used in LibreOffice in order to open WP files, because the file I saved with correct U+03C2 is substituted with U+03DB when opened in LibreOffice.

taviso commented 1 year ago

Interesting, thanks for the information!

Regarding printing (ghostscript): Although "ς" is shown on preview, it seems to not be printed - not shown on PS output. Yet, reading PS source I can see sigma1 which seems to be equivalent to sigmafinal.

Hmm - we can switch to sigmafinal instead if that works better? Can you try manually using /sigmafinal in the postscript and see if it looks better?

krackout commented 1 year ago

Good suggestion, unfortunately /sigmafinal in PS file didn't produce any change. Searching a bit in Adobe's Glyph List Specification, probably it's better to preserve sigma1, since it's in both lists, AGL (glyphlist.txt) and AGLFN (aglfn.txt), while sigmafinal is mentioned only in the former.
Both sigma1 & sigmafinal map to 03C2.

I'll try to dig a bit further on it.

emendelson commented 1 year ago

This seems to be a Ghostscript problem. When I open one of the Ghostscript fonts in a font editor, the terminal sigma is not named either sigma1 or sigmafinal but

uni03C2

If you manually edit the PS code to use this (with the correct case for all letters) the character prints correctly. So this requires a minor fix in the WP driver, and all should be well.

EDIT: It may make life easier to use the same character map for all the Ghostscript fonts except Symbol. Multiple fonts can use the same character map, and than you only need to fix one map to update all the fonts. This is what I do in my vDosWP system.

EDIT2: I've modified the current gscript.all file to use a single character map for all fonts (with the corrected PS name for the terminal sigma), and have also added the Zapf Dingbats font which is included in the original WP PostScript drivers. I've made a start at assigning the PostScript names to the Zapf Dingbat characters but haven't completed the job; this doesn't affect printing at all; it just makes things a bit more elegant. Here's the updated .all file:

https://www.dropbox.com/s/d9qzhgxrtyudbff/gscript.all?dl=0

krackout commented 1 year ago

You are right, changing sigma1 to uni03C2 presents "ς" in PS file!

Yet, shouldn't it be sigma1? I wonder if it should be reported to be fixed in ghostscript.

emendelson commented 1 year ago

You might want to consider reporting it to Ghostscript, but I would guess that it might cause trouble to other applications to change it now, so they might not want to do it.

krackout commented 1 year ago

Yes, you are right.

emendelson commented 1 year ago

But I looked again at the font file, and saw that the character also has the alternate names sigma1 and sigmafinal listed in the definition of the glyph. I thought that Ghostscript should be able to find the character under any of those names, and possibly modern programs can access the characters under any of these names - while an ancient program like WordPerfect sends data in a way that requires only one specfic glyphname. I'm only guessing, of course!

krackout commented 1 year ago

I printed a document with all the supplied fonts, including all characters of latin and greek alphabets. The result is that "ς" (Greek Small Letter Final Sigma U+03C2) does not appear on Courier font only (needs the change of sigma1 to uni03C2). Despite Courier having the full greek character set; the others made obvious substitutions for greek letters they missed. So it seems to be a font problem only, not a general Wordperfect or Ghostscript one.

emendelson commented 1 year ago

@krackout - Did you try my updated .all file?

krackout commented 1 year ago

@emendelson - Yes, I have. I actually had tried it as soon as you posted it. But since you mentioned that it doesn't affect printing at all, I didn't expect anything to change.

emendelson commented 1 year ago

@krackout - the .all file ONLY affects printing (and print preview) - if I said anything else, I certainly didn't mean to do so.

After installing the .all file in /opt/wp80/shlib10, it's necessary to use Shift-F7, Select, Update; the information message that appears will include the date 27 November 2022 at the end.

krackout commented 1 year ago

@emendelson - Sorry, missed that step. After updating, "ς" appears in Courier also now. Better fallback also on missing greek characters, for the other fonts.

emendelson commented 1 year ago

@krackout - Thank you for confirming!

emendelson commented 1 year ago

Actually, to clarify: the .all file also affects formatting (line breaks etc.) because the printer driver that you select includes font information that WP uses when determining the width of words and letters. So that means that when you change printer drivers, the formatting of the document may also change. This generally won't affect text in monospace fonts like Courier (but it could - if different printer drivers use different line heights for their version of Courier), but it may affect text in proportional fonts.

taviso commented 1 year ago

I think I've fixed this in the latest commit, but the code is complicated and I'm not 100% sure I've got it right - I'm going to use it for a bit to make sure everything is working, and then I'll make a new release.

Sorry this took so long to get to!

taviso commented 1 year ago

Alright, it seems to be working okay for me - I made a new release!

https://github.com/taviso/wpunix/releases/tag/v0.12

Let's call this fixed for now, but please re-open or open a new issue for any problems or if it doesn't work properly!

emendelson commented 1 year ago

Could we reopen this? There turns out to be one more mistaken character left over from a bad commit in libwpd many years ago. WP character 8,65 got changed to 1fe5 but it should be 03f1. I've been working on patching libwpd to restore the correct characters (and have added details to my ticket about this), and I'm fairly sure this this is the one character that is wrong in the current set.

The screen shot below shows the wrong character on the editing screen on the left, and the correct character on the right, which is a screen shot of WP's print preview of the same file. Again, the print preview screen shot on the right gets it right, the editing screen on the left gets it wrong. And WP character 8,65 should be 0x03f1.

Apologies for not sorting this out the first time we talked about this.

Greek