openstenoproject / plover

Open source stenotype engine
http://opensteno.org/plover
GNU General Public License v2.0
2.36k stars 277 forks source link

Support stenoram machines #49

Open balshetzer opened 12 years ago

balshetzer commented 12 years ago

Ben Tarkeshian has posted notes of reverse engineering a stenoram machine to the point where it might be possible to add plover support.

vim: foldmethod=marker

SR3_REVENG_v0.0.2.txt

CHANGELOG: v0.0.2 no need to clear stenoRAM beforehand; init sequence has the "starting" stroke number for "QUERY" / "NO STROKE" exchanges

DISCLAIMER: {{{

This I believe is the bare minimum for how to write a "working" driver; there are still many gaps and this is still very much a rough draft.

And the usual:

no guarantees any of this is correct

this document might actually be an ingenious insiduous plot
for convincing people to splurge on modern machines with
simpler protocols...I can neither confirm nor deny that...

"when life hands you lemons, get a receipt"

}}}

misc. notes: {{{

(google these)

}}}

{{{ INIT

PC kicks things off by sending:

1 WRITE:28: 16 03 00 00 01 00 00 FF 09 00 BA DD B5 9A 8A 91 25 A4 03 00 00 FF 00 00 02 01 03 00

and the SR reply is

2 READ_:26: 16 03 00 00 05 00 00 00 08 00 36 9F F4 8A 4C 0C 3C B5 8E 46 0D 01 53 52 33 20 "S""R""3"" " in ASCII

the PC then sends:

3 WRITE:28: 16 03 00 00 01 00 00 FF 09 00 99 27 52 26 BD 34 6C F8 04 00 00 FF 00 00 02 01 03 00

and the SR reply is

    (note: "checksum" data below only applies if the
        stenoram's RAM was cleared before this session
    )

4 READ_:21: 16 03 00 00 06 00 00 00 03 00 00 00 00 00 4C A8 18 1A 00 00 00 ^^ ^^ for the SR reply, what matters is the stroke number; the PC should take that stroke number, and

    1) set the "PC QUERY FOR STROKE" to the corresponding bytestring
        matching this stroke number

    2) set the "SR NO STROKE" bytestring to match this stroke number
            minus one

}}}

{{{ INITIAL "QUERY FOR STROKE" EXCHANGE

immediately after INIT, PC sends:
    WRITE:28:   16 03 00 00 01 00 00 FF
                09 00
                    DE 0A BB FF EA 2E 97 41
                01
                    00 00
                    00 64 00 02 02 03 00

and if there is no stroke, the SR replies
    READ_:24:   16 03 00 00 03 00 00 00
                06 00
                    07 01 FF FB 1E C4 6D 92
                00 00 FF FF FF 00

note the checksums/stroke numbers for this exchange are only
valid for the first such exchange (that is, if you cleared the
stenoram's RAM before this "run");

see "FLOW OF EXECUTION" to determine how to both:

    1) update the "QUERY FOR STROKE" and "NO STROKE REPLY" strings
        as needed
    2) determine what their initial values should be if the stenoram's
        RAM was NOT cleared before this "run"

}}}

{{{ SHUTDOWN

on shutdown, the PC sends:
    WRITE:28:   16 03 00 00 01 00 00 FF
                09 00
                    0C FD DF 39 FF F8 6F 4E
                01 00 00 00 00 00 02 03 03 00

and the SR replies:
    READ_:24:   16 03 00 00 03 00 00 00
                06 00
                    07 01 FF FB 1E C4 6D 92
                00 00 FF FF FF 00

    NOTE: this is the same as the first "SR no stroke" reply bytestring
        (well, by the "stroke number" it is the last such string...
        but it is sent "first" on a run after RAM is cleared)

}}} ================================================================================

{{{ FLOW OF EXECUTION

do the "INIT" sequence; this determines at what stroke number the initial "PC QUERY FOR STROKE" and "SR NO STROKE REPLY" bytestrings should start at

until PC starts the shutdown sequence do

send the current 28-byte "PC query for stroke" data

read 24 bytes from the SR

(inspect the 24 bytes to determine if there was a stroke or not)

if (no stroke) then

    the 24 bytes the SR sent should match the
    current "SR no stroke" reply bytes

else
    /*      there was a stroke;

            the SR actually sent 25-33 bytes containing stroke information;

            inspect the 24-byte header to see how many more bytes to read
            from the SR to get the stroke data (see "STROKE DATA") and
            then read that many more bytes
    */

    update the "PC query for stroke" bytestring (*** is what needs updated)

{{{ 28 bytes: ( 10 constant bytes: 16 03 00 00 01 00 00 FF 09 00 {{{ * 8 variable (well...hardcoded to stroke number...) bytes: DE 0A BB FF EA 2E 97 41 # 01 00 00 90 C8 55 A6 C8 D2 1C E6 # 01 01 00 4C FD 6C 00 28 C2 EB 3A # 01 02 00 5B F0 AD 04 B7 E2 F1 F9 # 01 03 00 D4 18 93 2A B4 94 BF 07 # 01 04 00 8C 0B 98 46 64 20 5B 4B # 01 05 00 74 12 94 B0 59 82 8C 89 # 01 06 00 D8 BD 8F DA B7 CD FE DD # 01 07 00 4F 34 E2 7C 39 EA 04 F3 # 01 08 00 03 77 9D 85 C4 38 F9 D3 # 01 09 00 00 44 28 CB E8 EA 62 C6 # 01 0A 00 ... }}} 1 constant byte: 01 * 2 variable bytes: Y0 Y1 (stroke number) 7 constant bytes: 00 64 00 02 02 03 00 }}} )

    update the "SR no stroke" reply bytestring (*** is what needs updated)
{{{         24 bytes: (
                8 constant bytes:   16 03 00 00 03 00 00 00
                2 constant bytes:   06 00

{{{ * 8 variable (well...hardcoded to stroke number...) bytes: 07 01 FF FB 1E C4 6D 92 # 00 00 FF FF FF 00 00 00 00 00 EC 49 41 B3 # 00 00 00 00 00 00 07 05 08 0E 91 4F 4E C5 # 00 00 01 00 00 00 0E 0A 10 1C A5 6C 11 1B # 00 00 02 00 00 00 79 17 2C 4A 7F E8 C4 29 # 00 00 03 00 00 00 1C 14 20 38 38 40 24 E2 # 00 00 04 00 00 00 63 19 28 46 0D D0 CE 55 # 00 00 05 00 00 00 F2 2E 58 94 08 54 6D 95 # 00 00 06 00 00 00 5D 6B 9C E2 AD 78 81 DD # 00 00 07 00 00 00 38 28 40 70 37 38 56 AE # 00 00 08 00 00 00 3F 2D 48 7E 74 12 39 98 # 00 00 09 00 00 00 C6 32 50 8C 26 39 F5 F8 # 00 00 0A 00 00 00 ... }}} 2 constant bytes: 00 00 * 2 variable bytes: Y0 Y1 (stroke number) 2 constant bytes: 00 00 }}} ) end

done

}}}

{{{ STROKE DATA

{{{ the SR sends 25-33 bytes containing stroke information (

{{{ determining the size of the stroke info. "packet" ( if single stroke in this transmission, SR sends 24 + (1, 2, or 3) bytes if two strokes in this transmission, SR sends 24 + (1, 2, or 3) + (1, 2, or 3) bytes if three strokes in this transmission, SR sends 24 + (1, 2, or 3) + (1, 2, or 3) + (1, 2, or 3) bytes:

                the 9th byte in the 24-byte "header" can be used
                to determine how many bytes of stroke data there are

}}} )

{{{ 24 byte "header" ( constant 8 bytes: 16 03 00 00 03 00 00 00 {{{ variable 1 byte: L0 how many bytes follow the X variables if 1 stroke in this transmission: 07-09 for 2 strokes: 08-0C for 3 strokes: 09-0F (to replicate 0F: try hitting -Z quickly, }}} since a single -Z is sent in 3 bytes) constant 1 byte: 00 {{{ variable 8 bytes: X0 X1 X2 X3 X4 X5 X6 X7 }}} checksum of some sort perhaps {{{ constant 2 bytes: 01 00 indicates: "1 stroke in this packet" 02 00 indicates: "2 strokes in this packet" }}} 03 00 indicates: "3 strokes in this packet" {{{ 2 variable bytes: Y0 Y1 stroke number, increments after every stroke;

                    Y0 is the lsB, Y1 is the msB

                    for first transmission, SR sends 00 00

                    for 2 strokes in a single transmission, the SR
                    increments this twice

                    for 3 strokes in a single transmission, the SR
                    increments this thrice

                    if first transmission has 2 strokes, SR starts this
                    at 01 00, so presumably it initially is FF FF

                    an example "roll over" (for a single-stroke transmission):
                        FF 0E   (Y0 Y1 before the stroke)

}}} 00 0F (Y0 Y1 after the stroke) constant 2 bytes: 00 00 }}} )

{{{ 1-9 bytes of stroke data ( {{{ EITHER ( variable 1 byte: Z0

                    =========
                    Z0 values
                    =========
TODO: 80, 84, AD

                    # XXX: 0x00-0x7F are used for 3-byte strokes
                        TODO    80
                    STPH-FPLT   81
                        ER      82
                        RE      83
                        TODO    84
                        RAOE    85
                        WAU     86
                        KWRES   87  # XXX: "yes" ?
                        THR-    88  # XXX: "there" ?
                        EUFPLT  89
                        OF      8A  # XXX: "of" ?
                        AOEU    8B  # XXX: "I" ?
                        AS      8C  # XXX: "as" ?
                        TPHOT   8D  # XXX: "not" ?
                        WHA     8E
                        -B      8F
                        STPHAO  90
                        WA      91
                        EUPB    92  # XXX: "in" ?
                        TK-     93
                        EUS     94  # XXX: "is" ?
                        AOE     95
                        HREU    96
                        -FT     97
                        OPB     98  # XXX: "on" ?
                        S-      99
                        PHR-    9A
                    EUFRPBLGTS  9B
                        TPHO    9C  # XXX: "no" ?
                        TPHEU   9D
                        E       9E
                        -S      9F
                    STKPWHRAO   A0
                        TPOR    A1  # XXX: "for" ?
                        TPHOE   A2  # XXX: "no" ?
                        STPH-   A3  # XXX: "?" ?
                        W-      A4  # XXX: "with" ?
                        UR      A5  # XXX: "your" ?
                        O       A6
                        SKP-    A7  # XXX: "and" ?
                        KWRE    A8
                        APBD    A9  # XXX: "and" ?
                        OE      AA
                        TH-     AB  # XXX: "the" ?
                        OR      AC  # XXX: "or" ?
                        TODO    AD
                        -D      AE
                        APB     AF  # XXX: "an" ?
                        T-      B0
                    -FRPBLGTS   B1
                    EUT         B2  # XXX: "it" ?
                    STKPWHR-    B3
                        -G      B4
                        -T      B5  
                        *       B6
                        TPH-    B7
                        U       B8
                        -F      B9
                        EU      BA
                        THA     BB
                        -RBGS   BC  # XXX: "," ?
                        TO      BD  # XXX: "to" ?
                        -FPLT   BE  # XXX: "." ?
                        A       BF  # XXX: "a" ?

                    # XXX: 0xC0-0xFF are used for 2-byte strokes
            }}} )
{{{         OR ( variable 2 bytes:  Z0 Z1

                    ===================
                    Z0 masks (optional)
                    ===================
                        A   01

                    ======================================
                    Z0 values (mandatory; FA == "nothing")
                    ======================================
                    # XXX: 0x00-0x7F are used for 3-byte strokes
                    # XXX: 0x80-0xBF are used for 1-byte strokes

                        KW-     C0
                        KHR-    C2
                        KR-     C4
                        SK-     C6
                    STKPWHR-    C8
                        TR-     CA
                        PR-     CC
                        ST-     CE
                        SR-     D0
                        TPR-    D2
                        THR-    D4
                        SH-     D6
                        SKWR-   D8
                        P-      DA
                        TKPW-   DC
                        PHR-    DE
                        H-      E0
                        KWR-    E2
                        HR-     E4
                        TP-     E6
                        PW      E8
                        R-      EA
                        TK-     EC
                        K-      EE
                        PH-     F0
                        TH-     F2
                        TPH-    F4
                        T-      F6
                        W       F8
                        -       FA
                        S-      FC
                        STPH-   FE

                    ===================
                    Z1 masks (optional)
                    ===================
                        U   20
                        E   40
                        O   80

                    ======================================
                    Z1 values (mandatory; 1F == "nothing")
                    ======================================
                        -FP     00
                        -RB     01
                        -BGT    02
                        -LS     03
                        -PLT    04
                        -RBG    05
                        -RPB    06
                        -FR     07
                        -PBG    08
                    -FRPBLGTS   09
                        -GS     0A
                        -FT     0B
                        -TS     0C
                        -PBT    0D
                        -BGS    0E
                        -RS     0F
                        -RBGS   10
                        -PBS    11
                        -FPLT   12
                        -P      13
                        -RT     14
                        -B      15
                        -PL     16
                        -G      17
                        -BG     18
                        -F      19
                        -L      1A
                        -PB     1B
                        -S      1C
                        -R      1D
                        -T      1E
                        -       1F
            ) }}}
{{{         OR ( variable 3 bytes: Z0 Z1 Z2

                    MARKER RIGHT    59 03 30    XXX: SKPR-RPGT
                    MARKER LEFT     66 04 C8    XXX: STWH-FBLS

                ========
                Z0 masks
                ========
                    01  R-
                    02  H-
                    04  W-
                    08  P-
                    10  K-
                    20  T-
                    40  S-
                    80  unused
                        # XXX: 0x80-0xBF are used for 1-byte strokes
                        # XXX: 0xC0-0xFF are used for 2-byte strokes

                ========
                Z1 masks
                ========
                    01  -P
                    02  -R
                    04  -F
                    08  U
                    10  E
                    20  *
                    40  O
                    80  A

                ========
                Z2 masks
                ========
                    01  #
                    02  -Z
                    04  -D
                    08  -S
                    10  -T
                    20  -G
                    40  -L
                    80  -B
            ) }}}
{{{         [ if transmission has 2 or 3 strokes in it
                if this transmission has 2 or 3 strokes in it, they are here
                (following the first stroke) in the same format

                vvvvvvvvvvvvvv      vvvvvvvvvvvvvv      vvvvvvvvvvvvvv
                <Z0> [Z1] [Z2]  [   <Z0> [Z1] [Z2]  [   <Z0> [Z1] [Z2]  ]   ]
}}}         ]

}}} )

}}} )

}}}

tarkb commented 9 years ago

https://oeis.org/wiki/Email_Servers_and_Superseeker https://oeis.org/ol_source.txt

Another possible source for finding the "checksum algorithm(s)" since they seem to be "constant" and only vary with the "stroke number." The "checksums" can be viewed as an "integer sequence" so assuming there is some rhyme or reason to them, algorithms used to find other "integer sequences" may show some hints as to the algorithm/operations involved.

I do have a (very old) Mathematica + license, so may look at ol_source.txt some day. Too many other projects, so little time :)

"superproc8.f / superproc9.f "

Fortran is perversely enticing ... and you thought it was obsolete, noone used it anymore :)

I have not really found any generic "algorithm brute force" program for arbitrary data, but such a thing should not be hard to write...the trick is

1) time to run

2) operations to support between "steps" in the sequence, to go from:

 value A -> value B -> value C

  this could be a number of operations (add 1st and 2nd bytes, shift 3rd byte left 3, complement 
  8th byte, etc. -- lots of possible operations and also operands, some possibly involving 
  constant operands not within the data itself..."add" itself has variants "ignore overflow" or
  "store high unit to xth byte" so even that is not so simple...god forbid any of the data is "signed" 
  and yet more possibilities...

  so while a "brute force" in theory would work eventually, it may not be 
  computationally feasible

3) values may not be on even "byte boundaries", would need to guess the bit and byte endian and also "bit size" (e.g. some values might be 4 bits long, others might be 16, etc.)

Which makes generic "brute force" of the algorithm to go from one "checksum" to the next likely to take too long to be feasible without further information. That said, a "typical" search might be feasible, e.g. assume all values are each one byte long, maybe clamp number of operations per transition to 10.

an interesting tool for visual comparison of arbitrary data:

http://phreakocious.net/PI/

"Protocol Informatics - Tools for Binary Protocol Analysis"

most things I have seen about reverse engineering say "have lots of time & patience, be prepared to write your own tools"...not sure there are too many "generic" tools out there for data of unknown format...even finding the format of one particular field would narrow things down a lot.

The smarter/quicker approach of course is to debug/decompile steno programs/drivers whose license does not forbid this. Or someone has/makes a contact who knows the details :)

The StenoRam itself, the software on it could likely be "dumped" and decompiled too...no idea of the hardware needed/legality of doing that :) AFAIK in the U.S. reverse engineering is generally allowed by default (IANAL...thank god!)...which is why much software you must agree not to reverse-engineer or decompile/disassemble it in order to use it :)

tarkb commented 9 years ago

mashing keys / guessing "common phrases" should produce the TODO bytes.

mashing all 2^23 possible strokes should prove if they are unused or not.

"normal use" might stumble upon them.

steno repair shops apparently have "auto-keyers" to test all keys function....software-controlled this would be quicker than human-mashing :)

i consider myself lucky to have stumbled upon the above byte values for strokes...nothing but guesswork.

this would be the "million monkeys" approach -- alternately, dissasemble existing drivers/programs and see what they do...IANAL.

also, an update to the above "protocol": with an intentionally "long delay" between reading/writing to serial port, I have seen up to 4F strokes per "packet" ... typically on my "modern" PC I normally only ever saw 3, but it seems this can vary a lot / there is a large buffer for slower PCs, so # of strokes per packet can be much more than 3 .....baud rate I use 19200 but no idea if that is required or not.....

0x4F == 79 decimal; assuming the largest case of 79 x 3 byte-strokes == up to 237 bytes of stroke data per "packet"..completely guessing, that might mean the stenoram 3 has an internal buffer of 256 bytes of "pending strokes" for the PC to read at a time.

various software speaks to SR3 differently, so there may be variant or completely different (perhaps backwards-compatible with other Xscribe machines?) protocols the machine supports as well......

my driver based on the above mostly works, but still has occasional hangs. the "serial port activity" icon on my machine's LCD disappears. restarting from the top of the protocol seems to work.......which is bad, but might be able to detect this from the PC side and recover...but probably causes loss of some strokes.......

changing the delay between reads/writes to serial port may or may not "fix" such hangs. my driver is not asynchronous yet, so arbitrary sleeps == prevents needlessly pegging the CPU.

tarkb commented 9 years ago

there is also still remaining:

1) "modem transfer" (believe there was an xscribe external modem sold separately, for sending data in stenoram RAM over the phone to a PC at another location...would be surprised if this does not work with other modems, but probably not worth the effort)

2) "COM:SND from RAM" (send saved strokes in SR3 RAM -> PC over serial port" should be relatively easy to find the protocol, just looking at serial data transferred using existing steno programs that implement this. globalcat supports this, i believe.

https://web.archive.org/web/19990125085923/http://globalcat32.com/

I should also add, globalcat documentation (text files after installing program) hints there was a "batch mode" that just output untranslated strokes to a file or whatever...so theoretically for machines plover does not yet support, you could run globalcat32 inside a virtual machine, and route the stroke data to plover / whatever on your "host" machine ...... this feature presumably is not in the "student edition" globalcat though....documentation hints it was activated with a command-line option, but the windows globalcat at the above link does not respond to command-line options from what I tried......presumably a "professional edition" feature only, and/or was removed in later versions and that documentation was outdated. maybe was the DOS version only...which does not seem included in the above download.

3) data the stenoram3 saves on floppy

These may or may not be similar to above protocol. Hopefully they at least use similar stroke format(s), I have not looked at that.

The old Xscribe PC I believe ran some form of CP/M, so "filesystem format" or "directory structure" of files on the floppy might be some variant of that. no idea what software/OS the stenoram itself uses, but presumably that would ease compatibility......

I had the Xscribe PC at some point...

It had PIP on it:

http://en.wikipedia.org/wiki/Peripheral_Interchange_Program

so pretty sure it was CP/M of some kind, with various Xscribe programs on it. "system inside the keyboard" + a nice monochrome monitor too. still worked when I had it :)

IIRC the "tape over floppy hole" trick lets "modern" 1.44 MB floppy drives read the (720k double-density?) floppies the SR3 uses...

IIRC windows (after 95/98/ME?) is stupid, will trash the floppy filesystem if you do not write-protect it...trash it and then say it is "unrecognized" ......

something like:

http://atari.8bitchip.info/FloppyMistery.php

Windows XP simple can not work well with floppy what is not strictly DOS standard formatted. Worst in all is that despite of it, it opens and works with non-standard (for him) floppies. Check of disk format is very superficial - if there is 0xE9 or 0xEB at start and more-less standard BPB, such floppy will be opened.... and messed up.

I don't recall finding a way to format a floppy on the PC side and have the stenoram recognize it...but the stenoram has built-in "format floppy" menu item, and I believe I can read the disk properly from a PC on a "typical" 1.44 MB floppy drive (or the newfangled USB enclosures for them), just have not looked at the data yet.....

If not, there is:

http://en.wikipedia.org/wiki/Individual_Computers_Catweasel http://www.kryoflux.com/

Should not be needed though... IIRC I just put a "standard PC 1.44 MB" floppy drive I had lying around in my SR3 because the one that came with it was flaky...and it works fine...so I believe any trouble reading StenoRAM 3 floppies on the PC side is likely a software and not hardware limitation + windows being stupid.