uxmal / reko

Reko is a binary decompiler.
https://uxmal.github.io/reko
GNU General Public License v2.0
2.15k stars 253 forks source link

PDP-10 36-bit file storage #1144

Open larsbrinkhoff opened 2 years ago

larsbrinkhoff commented 2 years ago

FYI,

Perhaps the three most important methods to store PDP-10 36-bit files in 8-bit files today are:

Binary image is the simplest; two 36-bit words are stored as 9 octets. This is typically used by FTP "binary image" mode.

ANSI ASCII has the nice property that PDP-10 ASCII text files comes out readable in octet format. The bit mapping can be described by this picture. Unused bits are zero. swizzle

ITS evacuate format is complicated. It's described here. In addition to making text files readable, like ANSI ASCII, it also convertions for CR LF line endings and Lisp machine characters. Long story.

larsbrinkhoff commented 2 years ago

The PDP-10 also comes with several "archive" file formats: half-inch magtapes of several kinds, DECtapes, ITS archives, etc.

uxmal commented 2 years ago

How comfortable are you with reading C#? It shouldn't be too much of a leap from C, especially given that the code for doing unpacking of bytes to words is going to involve low-level bit twiddling.

My question is: is there a way to identify the encoding, or does the user have to provide it manually? I.e. is it sufficient to look at the .bin extension to know that the data in the file is to be processed in a certain way? The current BinLoader class reads 5 octets then chops off the last 4 bits, yielding a 36-bit word (but does no permutations), but in your text above you state a different algorithm:

Binary image is the simplest; two 36-bit words are stored as 9 octets

Maybe you could guide me to the simplest file to work with.

larsbrinkhoff commented 2 years ago

I think most of your questions were resolved on Gitter.

Your BinLoader is not what I mean with "binary image". FTP image mode will just transfer bits with no padding between them. So the first 36-bit word will occupy 4½ octets. The next 36-bit word will fill out the other half octet and four more. Obviously a file with an odd number of word will be padded out with four unused bits in the last octet.

However, this format is pretty much only used by FTP and I have no samples. Hmm no, it's also used by https://github.com/mikpe/pdp10-tools