larsbrinkhoff / pdp10-its-disassembler

Disassembler and other tools for files in ITS formats
GNU General Public License v2.0
19 stars 13 forks source link

Classify tape images #150

Open larsbrinkhoff opened 2 years ago

larsbrinkhoff commented 2 years ago

Add a tool to classify tape images, especially those formats in use at MIT.

CC @eswenson1 @ams

larsbrinkhoff commented 2 years ago

For PDP-10 tapes, we'll also have to consider whether they are 9 or 7 track, little or big endian, etc.

ams commented 2 years ago

Add a tool to classify tape images, especially those formats in use at MIT.

  • ITS DUMP format.
  • TOPS-20 DUMPER.
  • Skip optional ANSI label.
  • Unix tar.
  • Lispm tapes.

FWIW, there are multiple formats for LispM tapes; Symbolics and MIT.

larsbrinkhoff commented 2 years ago

there are multiple formats for LispM tapes; Symbolics and MIT.

I can see that. I see something beginning with some ASCII text strings like

PRELUDE
VERSION 3
TAPE-SYSTEM-VERSION 429
LMFS-VERSION 428.1 RELEASED

Is that the (or a) Symbolics format? I think those are tapes from REAGAN.

And then there's something more binary looking.

larsbrinkhoff commented 2 years ago

I also see VMS BACKUP tapes. Hoping this will help: https://github.com/kkaempf/vmsbackup

larsbrinkhoff commented 2 years ago

I have some code up on a branch called lars/classify-tape. So far I try to detect:

I see some unclassified tapes that look liike Unix or Lispm. The vast majority of tapes are ITS or TOPS-20, but there are quite a few Unix, Lispm, and VMS tapes that are as of yet totally unexplored.

larsbrinkhoff commented 2 years ago

Detection schemes range from heuristic to hacky. I mainly look at the first record, skipping any ANSI label if present.

larsbrinkhoff commented 2 years ago

Hello @romkey,

I'm working on classifying tape image files from MIT's "Tapes o' Tech Square" collection of backups. Some of the files seem to be MS-DOS floppy images, or at least some kind of FAT file system. I was surprised to see that type of data on MIT backup media, especially on what should be magnetic tapes.

When I think about PC's at MIT in the 80s, I of course think of the author of PC/IP. So, do you have any idea how this MS-DOS data would appear on magtapes from MIT?

Thanks!

larsbrinkhoff commented 2 years ago

Classification results so far, on my limited corpus.

   2136 ITS DUMP
   1410 TOPS-20 DUMPER
    683 Unix dump
    261 Symbolics LMFS dump
    236 Unknown
    154 ITS DUMP (with label)
    106 Unix tar
    105 Error reading tape image
     78 VMS BACKUP
     15 TOPS-20 install
      8 FAT file system
      4 Unix cpio
      4 TOPS-10 BACKUP
      1 TOPS-10 FAILSAFE
      1 MIT/LMI dump

Breakdown of platforms:

Tapes Percentage Platform
5202 100 All
3720 72 PDP-10
793 15 Unix
341 7 Unknown/bad
262 5 Lispm
78 1 VMS
8 0.1 PC