Closed krackout closed 1 year ago
Very Interesting! I can confirm the problem, but the issue isn't obvious, the character definitions seem to be correct at first look... I'll step through the code and see where it's going wrong.
Just some notes to myself:
It seems like mapkey()
does not correctly handle extended characters when the least significant octet of the scancode happens to be zero, and U+0391 just randomly happens to be 0xf800.
If I manually skip that octet check at mapkey+12
with a debugger, it does seem to work correctly! I think I will need to re-implement mapkey()
, which doesn't seem too complicated.... (famous last words).
It seems like the terminal sigma issue is different from the capital alpha issue - that is just plain wrong character we inherited from libwpd! I think it's "stigma" which is a ligature for sigma+tau that just kinda looks like terminal sigma. That one is an easy fix.
Is the uppercase terminal sigma correct? it looks like this: Ϲ
This is amazing detective work. Am I right in thinking that we're talking about WP character 8,39? If so, then it seems that libwpd has that character right (and also 8,38) in the WP6 character set, which begins like this:
0x0391, 0x03b1, 0x0392, 0x03b2, 0x0392, 0x03d0, 0x0393, 0x03b3,
0x0394, 0x03b4, 0x0395, 0x03b5, 0x0396, 0x03b6, 0x0397, 0x03b7,
0x0398, 0x03b8, 0x0399, 0x03b9, 0x039a, 0x03ba, 0x039b, 0x03bb,
0x039c, 0x03bc, 0x039d, 0x03bd, 0x039e, 0x03be, 0x039f, 0x03bf,
0x03a0, 0x03c0, 0x03a1, 0x03c1, 0x03a3, 0x03c3, 0x03a3, 0x03c2,
The WP5 character set seems to have the wrong characters in the last two positions on the fifth line:
0x0391, 0x03b1, 0x0392, 0x03b2, 0x0392, 0x03d0, 0x0393, 0x03b3,
0x0394, 0x03b4, 0x0395, 0x03b5, 0x0396, 0x03b6, 0x0397, 0x03b7,
0x0398, 0x03b8, 0x0399, 0x03b9, 0x039a, 0x03ba, 0x039b, 0x03bb,
0x039c, 0x03bc, 0x039d, 0x03bd, 0x039e, 0x03be, 0x039f, 0x03bf,
0x03a0, 0x03c0, 0x03a1, 0x03c1, 0x03a3, 0x03c3, 0x03f9, 0x03db,
Am I reading this correctly, or am I totally confused?
Yep, it is 8,39 and you're reading that correctly! If you look at the characters it's easy to confuse them, so it's an understandable error: ϛς
I've filed a ticket at the libwpd SourceForge page, and will alert the author of vDos.exe which has special features for exchanging data with the Windows clipboard.
Shouldn't 8,38 also be changed? The character is defined in CHARACTR.DOC as SIGMA (terminal), but there does not seem to be an uppercase SIGMA used at the end of a word. WP5 has the obviously wrong Ϲ "Greek Capital Lunate Sigma" while WP6 has Σ, the same uppercase Sigma at 8,36, which seems to be the correct way to do this.
That's what I meant above, I wasn't sure!
Now that I look into it, the uppercase form of stigma is this lunate form, so whoever prepared the table probably just looked up the uppercase form of the wrong character. It seems like Σ is correct.
That's what I meant above, I wasn't sure!
Ah - I was replying too quickly to remember that you mentioned that character!
I went ahead and made that change. I'll try to rewrite the mapkey() function, I think this bug might prevent directly entering the first character of some sets from the keyboard - do you know if that works in DOS?
For example, without using compose and just using a localized keyboard layout, I think these characters are impossible to enter in UNIX:
Hebrew Aleph (א) Greek Alpha (Α) Russian A (А) Japanese phonetic a (ぁ)
I think DOS handles the keyboard very differently. Under DOS, the keyboard layout currently loaded with the KEYB command (US, GK for Greek, etc.) translates the physical key that you type into a number that corresponds to one of the characters in the current 256-character code page. WordPerfect switches to its own internal table of characters, which matches the number from 1 to 256 that it gets from keyboard to a WP character. (And in order to display the Greek alphabet, you need to load a .CPI file that switches the text display so that Greek letters appear instead of the Roman alphabet.) For example:
https://hwiegman.home.xs4all.nl/msdos/113841.htm
I haven't done all of this for years, but I just now tried (using DOSBox-X) the Greek keyboard and code page and had no trouble typing any character. Unix makes everything a lot less complicated...!
You seem to be on the right path, I'll nevertheless clarify on ς-Σ-σ: "ς" (Greek Small Letter Final Sigma U+03C2)'s capital letter is "Σ" (Greek Capital Letter Sigma U+03A3). There is no distinct Greek Capital Letter Final Sigma. So essentially both "σ" (Greek Small Letter Sigma U+03C3) and "ς" (U+03C2) have "Σ" (U+03A3) as their capital letter.
In practice, if you press either shift+s
or shift+w
on a greek keyboard you get "Σ". Without shift, if you press s
you get "σ", if you press w
you get "ς".
"ϛ" (Greek Small Letter Stigma U+03DB) is not used in modern greek, not even in classical ancient greek if I recall well; only in archaic forms and for numbering. Actually I just found that I can type Stigma in wpunix using AltGr+w
combination! But it's not printed.
I copied xterm.trs
from a893dec to /opt/wp80/shlib10/
, it works great! pressing w
gives "ς" (Greek Small Letter Final Sigma U+03C2), shift+w
gives "Σ" (Greek Capital Letter Sigma U+03A3).
Regarding printing (ghostscript): Although "ς" is shown on preview, it seems to not be printed - not shown on PS output. Yet, reading PS source I can see sigma1
which seems to be equivalent to sigmafinal
. Also, to be more perplexed, converting PS to PDF, still "ς" not shown, but after converting PDF to plain text, "ς" appeared!
I suppose libwpd is used in LibreOffice in order to open WP files, because the file I saved with correct U+03C2 is substituted with U+03DB when opened in LibreOffice.
Interesting, thanks for the information!
Regarding printing (ghostscript): Although "ς" is shown on preview, it seems to not be printed - not shown on PS output. Yet, reading PS source I can see sigma1 which seems to be equivalent to sigmafinal.
Hmm - we can switch to sigmafinal instead if that works better? Can you try manually using /sigmafinal in the postscript and see if it looks better?
Good suggestion, unfortunately /sigmafinal
in PS file didn't produce any change. Searching a bit in Adobe's Glyph List Specification, probably it's better to preserve sigma1
, since it's in both lists, AGL (glyphlist.txt) and AGLFN (aglfn.txt), while sigmafinal
is mentioned only in the former.
Both sigma1
& sigmafinal
map to 03C2.
I'll try to dig a bit further on it.
This seems to be a Ghostscript problem. When I open one of the Ghostscript fonts in a font editor, the terminal sigma is not named either sigma1
or sigmafinal
but
uni03C2
If you manually edit the PS code to use this (with the correct case for all letters) the character prints correctly. So this requires a minor fix in the WP driver, and all should be well.
EDIT: It may make life easier to use the same character map for all the Ghostscript fonts except Symbol. Multiple fonts can use the same character map, and than you only need to fix one map to update all the fonts. This is what I do in my vDosWP system.
EDIT2: I've modified the current gscript.all file to use a single character map for all fonts (with the corrected PS name for the terminal sigma), and have also added the Zapf Dingbats font which is included in the original WP PostScript drivers. I've made a start at assigning the PostScript names to the Zapf Dingbat characters but haven't completed the job; this doesn't affect printing at all; it just makes things a bit more elegant. Here's the updated .all file:
You are right, changing sigma1
to uni03C2
presents "ς" in PS file!
Yet, shouldn't it be sigma1
? I wonder if it should be reported to be fixed in ghostscript.
You might want to consider reporting it to Ghostscript, but I would guess that it might cause trouble to other applications to change it now, so they might not want to do it.
Yes, you are right.
But I looked again at the font file, and saw that the character also has the alternate names sigma1
and sigmafinal
listed in the definition of the glyph. I thought that Ghostscript should be able to find the character under any of those names, and possibly modern programs can access the characters under any of these names - while an ancient program like WordPerfect sends data in a way that requires only one specfic glyphname. I'm only guessing, of course!
I printed a document with all the supplied fonts, including all characters of latin and greek alphabets. The result is that "ς" (Greek Small Letter Final Sigma U+03C2) does not appear on Courier font only (needs the change of sigma1
to uni03C2
). Despite Courier having the full greek character set; the others made obvious substitutions for greek letters they missed. So it seems to be a font problem only, not a general Wordperfect or Ghostscript one.
@krackout - Did you try my updated .all file?
@emendelson - Yes, I have. I actually had tried it as soon as you posted it. But since you mentioned that it doesn't affect printing at all, I didn't expect anything to change.
@krackout - the .all file ONLY affects printing (and print preview) - if I said anything else, I certainly didn't mean to do so.
After installing the .all file in /opt/wp80/shlib10
, it's necessary to use Shift-F7, Select, Update; the information message that appears will include the date 27 November 2022 at the end.
@emendelson - Sorry, missed that step. After updating, "ς" appears in Courier also now. Better fallback also on missing greek characters, for the other fonts.
@krackout - Thank you for confirming!
Actually, to clarify: the .all file also affects formatting (line breaks etc.) because the printer driver that you select includes font information that WP uses when determining the width of words and letters. So that means that when you change printer drivers, the formatting of the document may also change. This generally won't affect text in monospace fonts like Courier (but it could - if different printer drivers use different line heights for their version of Courier), but it may affect text in proportional fonts.
I think I've fixed this in the latest commit, but the code is complicated and I'm not 100% sure I've got it right - I'm going to use it for a bit to make sure everything is working, and then I'll make a new release.
Sorry this took so long to get to!
Alright, it seems to be working okay for me - I made a new release!
https://github.com/taviso/wpunix/releases/tag/v0.12
Let's call this fixed for now, but please re-open or open a new issue for any problems or if it doesn't work properly!
Could we reopen this? There turns out to be one more mistaken character left over from a bad commit in libwpd many years ago. WP character 8,65 got changed to 1fe5
but it should be 03f1
. I've been working on patching libwpd to restore the correct characters (and have added details to my ticket about this), and I'm fairly sure this this is the one character that is wrong in the current set.
The screen shot below shows the wrong character on the editing screen on the left, and the correct character on the right, which is a screen shot of WP's print preview of the same file. Again, the print preview screen shot on the right gets it right, the editing screen on the left gets it wrong. And WP character 8,65 should be 0x03f1
.
Apologies for not sorting this out the first time we talked about this.
Contrary to an issue regarding cyrillic, greek characters can be typed, and shown on console, on preview (sixel) and on print (ghostscript). All but two characters, "Α" (Greek Capital Letter Alpha U+0391) and "ς" (Greek Small Letter Final Sigma U+03C2) .
By the way, amazed by your research and work on 1-2-3, UNIX & MS-DOS and WP!