ut-utp / core

Mozilla Public License 2.0
1 stars 0 forks source link

Rethink Input/Output peripherals for Unicode support #12

Open rrbutani opened 4 years ago

rrbutani commented 4 years ago

what

Explore whether we want to offer (full or partial) Unicode support and how.

steps

where

branch: feat-unicode

open questions

rrbutani commented 4 years ago

Copying over the notes in isa/src/misc.rs so we can talk about this here (it does look better as a rust doc though):

Formally, the input and output devices that are part of the LC-3 only support ASCII characters. We'd like to support more than just ASCII; ideally Unicode text in general but therein lies a problem.

The LC-3 has a Word size of 16 bits. UTF-8 (seemingly the de facto Unicode encoding these days) encoded characters can occupy a variable number of bytes (between 1 and 4, inclusive). This is problematic for our LC-3 because:

Fwiw, this LC-3 assembler crate seems to have support for UTF-8 string literals iiuc; I'm not sure how (if at all) they deal with UTF-8 strings/characters in the userspace.