joncampbell123 / dosbox-x

DOSBox-X fork of the DOSBox project
GNU General Public License v2.0
2.66k stars 378 forks source link

New codepage - MIK-Bulgaria i need. #2622

Open AngelToshkov opened 3 years ago

AngelToshkov commented 3 years ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

AngelToshkov commented 3 years ago

Hello Doxbox-X currently supports code pages 437, 808, 850, 852, 853, 855, 857, 858, 860, 861, 862, 863, 864, 865, 866, 869, 872 and 874. Can a new one be added - MIK - Bulgarian. This code page is known by FreeDOS as Code page 3021. I have some very important programs that work without a problem on Dosbox MB6. There, the Cyrillic alphabet is loaded with a separate .exe program, but in DOS-Boi-X it not works. In DosBox-x everything works correctly, but the text is not readable on the display due to the lack of this code page. Is it possible to load externally in some way, or should codepage (for example 359 - which in practice does not exist) be added to implement MIC-Bulgaria. Here is an internet link to the page structure - https://en.wikipedia.org/wiki/MIK_(character_set)

Wengier commented 3 years ago

@AngelToshkov Sure, a new DOS code page can always be added, as long as we can find a code page mapping file for it. For example, we can find the mapping files for code page 437, 850, 852 etc from this page:

https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/

And for DBCS code pages (932, 936, 949, 950) also supported by DOSBox-X from this page:

https://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/

Do you know where the mapping file for the said code page 3021 can be found?

Meanwhile, I see you said there are some Bulgarian programs work fine on DOSBox MB6 with the use of a separate EXE program. If this is the case, it should work in DOSBox-X too with the use of the same EXE program, if you use a default output such as DirectX or OpenGL. Try set output=direct3d (or output=opengl) in the DOSBox-X config file (dosbox-x.conf), load the said EXE program for Cyrillic characters, and run your Bulgarian programs and see how they work. Hope this helps.

AngelToshkov commented 3 years ago

Thank you very much for the quick response. I tried in output mode = direct3d and in output = opengl. The program for Cyrillicization (Bulgarian) of the display and the keyboard works without problems.

If I find a code page mapping file for 3021 or I prepare it, would you add it?

I can use 437 (US) and 866 (Russian) PC code page mapping file as a basis to create 442 (this is the country number of Bulgaria / https://www.computerhope.com/chcphlp.htm).

Wengier commented 3 years ago

@AngelToshkov Glad to hear that the said output modes work for you.

And sure, since 3021 is a valid code page used by DOS, I will add it if it is available. Thanks for your support.

AngelToshkov commented 3 years ago

Thanks. When I am ready I will write.

Torinde commented 2 years ago

CP442.txt mapping file I created from cp437_DOSLatinUS (00 to 7F, D9 to FF), cp866_DOSCyrillicRussian (80 to AF), manually (B0 to D8)

@Wengier, I assume after it gets added as codepage in DOSbox-X, then when loading that codepage the Cyrillic symbols will appear on the proper places. But how can I make my normal QWERTY keyboard to type those symbols in DOSbox-X?

rderooy commented 2 years ago

You can type characters not on your keyboard using the left-alt and the numeric keypad. https://en.wikipedia.org/wiki/Alt_code

For instance if you keep the left ALT pressed and type a number on the numeric keypad between 0 and 255 you will get various characters (depending on the codepage loaded).

Wengier commented 2 years ago

@Torinde A Windows DOSBox-X build that contains the code page 3021 (MIK-Bulgaria) is now available from:

P.S. 442 is the country number, whereas 3021 is the code page number. For typing symbols, you can also copy from the clipboard apart from other methods.

Torinde commented 2 years ago

Great, thank you! I'll try it, but good if @AngelToshkov tests with his application.

OK about Alt-code and copy-pasting - but isn't there a native DOS way (without custom EXE programs) of using the "extra" keyboard layout? For example with KEYB de I switch to Germany and then [;' keys give me üöä (u with two dots, o with two dots, etc.) I tried KEYB ru which switches to Russian layout & codepage (which both should have Cyrillic), but all keys give me normal Latin characters.

Is there some key combination to switch Latin/Cyrillic? Or something about country.sys, keyboard driver, etc. in CONFIG.SYS/AUTOEXEC.BAT ?

Wengier commented 2 years ago

Since 3021 is a valid DOS code page used by e.g. FreeDOS as mentioned earlier, so if you copy the relevant keyboard file(s) from real DOS and load them then you may be able enter the Cyrillic characters directly. But I do not have a Bulgarian keyboard to test this directly myself.

maron2000 commented 2 years ago

On MSDOS 6.22, ALT+SHIFT(left or right) is used to switch input. https://data.phys.ucalgary.ca/sort_by_instrument/other/poca/eu9366/001/DOS/COUNTRY.TXT

Torinde commented 2 years ago

It works! And I have some observations, questions, suggestions, buts...

I found out how you can type Cyrillic (and Greek and maybe other) letters with normal QWERTY physical keyboard and without any EXE programs: run "keyb bg" and then press LeftAlt+RightShift - then when you type Cyrillic letters appear. LeftAlt+LeftSwitch turns back to Latin typing. image (and while I type this maron2000 replied with the same :))

Clear about the 442/3021. Here is a CP3021.TXT for mappings - mentioned FreeDOS 3021 and this issue 2622 in the General notes comments about the historic origin of the character set and how the mapping file was created now.

Here is a table with the 255 characters that I made for testing: test2.txt - screenshots below are made with a prior variant where I forgot to delete some control characters in the first rows, but that doesn't matter as there 3021 is same as 437.

Issue I see - B0, B1, B2 are wrong. Here are kept the three "smooth looking" shading blocks from TTF437 instead of the Cyrillic letters "рст". In 3021 the shading blocks are moved to D0-D2 - and strangely there appear "not smooth looking" shading blocks (e.g. as if from Direct3D-437 instead of TTF-437). As you can see in the 3rd picture - "рст" letters exist in TTF (but of course at other places in 872). image TTF 0.83.22 3021, TTF 0.83.22 437, TTF 0.83.22 872, MIK target look

Potential issue - chcp/keyb - somehow inconsistent TTF vs non-TTF (when changing those on-the-fly / without restart). Sometimes it claims "Changing to this code page is only supported for the TrueType font output.", but actually it works in non-TTF mode? Then even if I switch back to 437 (via "keyb us") and get acknowledge message (and status output) - non-TTF actually uses 872 (the one invoked by "keyb bg" much earlier). chcp declines to switch to 872 in non-TTF mode, but if I had 3021 in TTF and then change the output to Direct3D, then use keyb to change to 872, eventhough keyb claims to have changed the codepage - in Direct3D it's not changed.

Hmm, now that I do further tests - it seems chcp works "per mode". Changes that I do in one mode apply to that mode only (or predominantly?) - regardless of what the status message claims for "Active code page: " - should that be reflected somehow? image image

Keyboard typing layout issue - although keyb reports "Codepage 3021 has been loaded for layout bg" only Latin letters appear (kind of expected, because the 442/241 layouts are mode for another codepage, e.g. 872?) image

In maron2000's link is explained that KEYBRD2.SYS has one extra keyboard layout for Bulgaria - 442. Google searches mention also KEYBOARD.SYS, KEYBRD3.SYS, KEYBRD4.SYS - I'll try to find layout matching with codepage 3021 in those and in FreeDOS, but in case there isn't - I can create a simple reference TXT file - can that be added? So that "keyb bg3021" enables 3021 codepage (almost ready) + required keyboard layout (and even that bg3021 should be the default for "keyb bg").

Where do you suggest that I read more about KEYBOARD.SYS, KEYBRD3.SYS, KEYBRD4.SYS ?

I will now test also with FreeDOS the codepage itself.

maron2000 commented 2 years ago

Built-in keyb command and/or keyboard.sys might not support codepage 3021. You can try freedos version. According to the follwing source, Freedos keyb should support cp3021. https://gitlab.com/FreeDOS/base/keyb_lay/-/blob/master/SOURCE/KEYB/LAYOUTS/BG241.KEY

Torinde commented 2 years ago

OK, so BG103.KEY should be the one! I'm installing FreeDOS now and will confirm afterwards. Ideally, as default: keyb bg = BG103.KEY + CP3021.TXT Useful variant is also: keyb bg442 = BG.KEY + CP3021.TXT

What's needed for BG103 to be added to DOSbox-X? Should I compile it to .KL or the .KEY is sufficient?

As for the original DOS keyboard drivers - documentation I found so far:

maron2000 commented 2 years ago

Windows at present doesn't support codepage 3021, therefore I don't think any MSDOS/PCDOS supported it. https://en.m.wikipedia.org/wiki/Windows_code_page You can either use .kl file or FreeDOS keybrd2.sys(in your case) in trying FreeDOS KEYB command.

Torinde commented 2 years ago

Checked with Hex editor in MSDOS8 KEYBRD3.SYS and KEYBRD4.SYS - a potentially non-exhaustive list of codes is:

Installed FreeDOS 1.3 RC5 in the 0.83.22 package above - 3021 doesn't work via FreeDOS's own keyb.

MIK.zip

DOSbox-X BG442 also has mistake for Shift+key16 as well -> F2 (wrong) instead of F1 (correct) for cp872 (and I assume for the other codepages as well). Here mistake is smaller - at least the letter is the same, but capital cyrillic yeru instead of small.

Testing of 0.83.22 cp3021:

in TTF mode, without referring to EGA16.CPX:

DOSbox keyb BG...

First keyb BG... 872, then chcp 3021, then keyb (to check status) -- BG103PHO - "Codepage 3021 has been loaded for layout BG103PHO", codepage OK, but keyboard Cyrillic layout not active -- BG442BDS - "Codepage 3021 has been loaded for layout BG103BDS", codepage OK, but keyboard Cyrillic layout not active

First chcp 3021, then keyb BG...

Issues:

Questions:

Wengier commented 2 years ago

@Torinde The wrong characters with B0, B1, B2 in TTF code page 3021 are fixed in this build:

TTF and non-TTF modes use very different fonts, and for the latter you may be able to customize the code pages by customizing the KEYB command and keyboard files themselves. Modify the .KEY file(s), and then convert them to .KL files via KC.EXE for use with KEYB command. On the other hand, custom code pages for TTF mode can be done more easily, either via customcodepage config option (in [dos] section) or via the optional second parameter for CHCP command, e.g. CHCP 400 CP400.TXT for standard Unicode mapping files. So you can test your self-made Unicode mapping files immediately with such a config option or command.

maron2000 commented 2 years ago

Built-in KEYB works in non-TTF mode if you explicitly specify the codepage (3021) and the corresponding CPX file (ega16.cpx). I don't know why but it only works for bg103(101-key phonetic) but not for bg(101-key). command_000

maron2000 commented 2 years ago

So as a workaround, we can use built-in KEYB to change codepage, and use FreeDOS KEYB(fd_keyb in the figure below) to change the keyboard layout. command_001

Since there is a bug in Shift+Q key in FreeDOS keybrd2.sys for layout bg (=bg442)+CP3021, I amended BG.key as attached. I don't know how but there should be a way to amend keybrd2.sys as well. BG.zip .

Torinde commented 2 years ago

@maron2000 with the .KL files from MIK.ZIP and CP3021.TXT (in above comments) even 0.83.21 both 103 and 442 layouts work with DOSboxKEYB image

@Wengier thanks for the quick update! I'll test it now. Didn't know about the CHCP capability to load mapping files, great!

DOSbox-X KEYB BG (defaults: layout 442, codepage 872) also has mistake for Shift+key16 (Q) - instead of lower yeru (ы) it types capital (Ы). In MIK.ZIP above I have corrected .KL file (but I got the original from FreeDOS, not DOSbox - don't know if there are differences)

I'll do more testing on DOS<->WIN shared clipboard copy/pasting, but so far I encounter strange behaviour for some characters even with 0.83.21 default keyb bg (442, 872) - when copying WIN (UTF8?)->DOS (872)

Torinde commented 2 years ago

Latest version by @Wengier works better, yes!

Shaded blocks B0, B1, B2 - Weniger's new version (that has рст properly appearing) - 872/437 appear smooth : image 3021 appear pixelated (similar to non-TTF): image Is 3021 using a different font character? Or what's going on?

Copy-pasting is better, especially with 3021 (although I need to test more situations - § doesn't appear when pasting in 872. But doesn't crash either). I tested printing in 0.83.21 and Cyrillic wasn't working (both text file and PNG output), next I will test with Wengier's version. There are some Windows keyboard layout characters missing in MIK (e.g. ѝЍ„“–І€) - Euro signs works with the .conf workaround, but for the others I wanted to select 'doubles' (e.g. two Unicode characters to be mapped to the same MIK character, e.g. йѝ to appear йй when copy-pasting) - but it seems chcp takes the last line mentioning the ASCII code and ignores the previous, so I can't do doubles via modified CP3021.TXT). Any ideas?

I wanted also to add an extra keyboard layout for another codepage into BG103PHO/BG442BDS.KL-s, but that resulted in too many SUBMAPPINGS. I saw 850 has - for key layout - does that mean these KEY/KL files don't work with 850? When I delete the "850 -" line from submappings list - the file seems to work, but am I breaking something else?

Finally, the DOS prompt blinking line (thin/tick - depending on Insert key) - I don't see those characters in 437 - where do they come from? image image edit: it's not a character.

maron2000 commented 2 years ago

850 - means "no table to look up", so it just falls back to the first row of "Submapping" section. If you delete that line, it will no longer match with codepage 850, so I think you can't use the kl file for CP 850 anymore. Anyway, you can try changing the codepage to 850 to see any sideeffects by deleting the mentioned line. You may want to consult the document included with kc.exe for details about .kl file format.

Torinde commented 2 years ago

Thanks, maron2000 - I tested and got "No layout in phow2 for codepage 850", makes sense - I deleted the 850 line, thus can't use that .KL file for it.

gmaslarov commented 3 months ago

hello, i saw this page while searching for dosbox-x support for non standart MIK codepage and it was great! I was able to run old accounting program with cyrillic letters. Now i am struggling to be able to print the output text file from the program to pdf. So far i tried MIK2UTF8.EXE in windows 10 cmd to convert the file and it shows all cyrillic letters as expected. I'm wondering is there a way to do this automatically from dosbox-x - print file.txt -> pdf file with cyrillic letters? Regards, George

Torinde commented 3 months ago

Did you try?

  1. Printing in DOSbox-X wiki
  2. DOSBox-X with printing and other additional features
gmaslarov commented 3 months ago

Hi, i checked dosbox-x wiki and can get fie output, the second link is new to me, but looks primissing. I'll try it and get back for result :-)

Torinde commented 3 months ago

@gmaslarov, do you say you already are able to get file output following the wiki procedure? Do you get TXT or PDF or which of the options? If so - are all the symbols shown properly or not? Maybe you can share some screenshots or files.

gmaslarov commented 3 months ago

hi there, didn't have time to test DOSBox-X-App, but with dosbox-x wiki - yes i have TXT and PDF. When i'm in dosbox-x console with - keyb bg 3021 - i can type TXT on screen with all cyrillic letters. When i open it in windows 10 notepad it is mumbo-jumbo, but when i use MIK2UTF8 executable to turn it to UTF8 it is ok - all letters are readable. When i try to automate the process with PDFcreator or cutepdf writer i have PDF file with non-readable letters. I'll attach some screenshot and files later. My best guess is that may be i have to convert the output file from the app i use in dosbox in unicode format. Regards

Torinde commented 3 months ago

Yes, it seems the issue is with the encoding. It would be great to have the option for the print output TXT file to have either the host encoding (Unicode ...) or the DOS codepage (like it is now).

Does the PDF have an image/graphics or selectable text? I assume text, otherwise it would've looked correctly.

What happens when you print to an image (PNG/BMP)?

What happens when you use the clipboard function to copy-paste text between DOSbox and your host OS?