itszor / gcc-6502

A port of GCC to the 6502 processor family.
GNU General Public License v2.0
97 stars 18 forks source link

Character conversion / mapping #3

Closed daybyter closed 8 years ago

daybyter commented 8 years ago

Hi!

While coding on a host machine (Linux in my case) you enter literals in the encoding of that host machine (UTF-8, Latin-1 or whatever). So the 'hello world!' in the hello program is a string literal encoded in that encoding. When displaying such a string on a c64, the encoding is wrong, since the c64 uses petscii as it's main character encoding (screen codes are somewhat different from it). So the string literals have to translated. cc65 has a pragma for character mappings, but it seems the 'official' gcc way to handle this would be to set the execution character set to petscii (-fexec-charset switch). gcc just uses the iconv call to convert character sets, and it seems that iconv is missing important character sets, like petscii ^^ . Maybe adding a module for this would be possible:

While reading

https://forums.suse.com/showthread.php?4698-iconv-and-custom-charactersets&s=92917d7df5c0ee23d6c886e7eb829c63

I found:

http://www.gnu.org/software/libc/manual/html_node/glibc-iconv-Implementation.html#glibc-iconv-Implementation

An intermediate hack would be a small C method to convert character mappings in the 6502 specific libs.

itszor commented 8 years ago

This is kind of a known problem -- on the BBC computers, it can be mostly ignored, because they use ASCII anyway. If you take a look at gcc/config/6502/6502.c, m65x_output_ascii has a comment "This doesn't pay much attention to character encoding issues" :-). At least for a hack, that's probably a good place to start.

What is the "ideal" solution? It might be fun to allow UTF8 Unicode control points representing all the "graphical" PETSCII characters in program source (string literals), and have the compiler map those to their target-machine equivalents. Otherwise I suppose you'd only be able to support the subset of characters that are in both plain ASCII and PETSCII.

itszor commented 8 years ago

(The Wikipedia PETSCII page shows some required glyphs are missing from Unicode, which is a shame!)

Claus64 commented 8 years ago

It seems to me that support for the common subset between ASCII and PETSCII is enough. Editors would not be able to display anything reasonable for the special PETSCII glyphs and graphical chars anyway.

itszor commented 8 years ago

I think we did this, so closing the issue. Thank you!