roll opens with roll #, series #, description of what it’s about (“General Correspondence”), and the date range in YYYY Mmm DD format. This data is also in the finding aid.
Second page is a restating of the date, and a more detailed description of the contents, just a few lines long, each topic separated by semicolons. This data is also in the finding aid.
Some pages have a typed strip at the top, with a message like “Original in Library, State University College, Oswego, New York”. On other it’s handwritten, or at the bottom. In most cases it’s visually removed from the page, but not always!
Handwritten, no letterhead.
Pencilled-in notes are made, presumably by curators, providing things like dates. How in the world we’re going to separate this out from the original content, I have no idea.
The back of the page appears to function as the envelope.
Page are sometimes torn and reassembled.
Dude has a hell of a signature.
Mail is stamped. Outgoing (?) mail is stamped “FREE.” Franking privileges?
Some of this text is very faint.
Paper appears to be of non-uniform size, sometimes it even appears to have been roughly cut, although perhaps that’s degradation over time.
I found a printed page! “A list of publications to which one can subscribe, apparently a magazine subscription circular. “Gentleman’s Magazine,” “Silliman’s Journal of Science,” and “Penny Magazine” are all available.
I found letterhead! Oddly, nothing else is on the page. “Babcock’s Patent Iron-Frame Piano Forte,” in Baltimore.
When a new year starts, the entire page is black, and there are huge, white letters reading “1835.” It’s very distinct.
Hey, here’s a letter with an illustration. I think it’s a patent illustration for a steam engine.
A lot of this handwriting is just impenetrable. I can’t read it. I don’t know how software is going to.
Interesting—this appears to be a handwritten, single-page finding aid, indicating that the letter on the following page is dated January 18, 1835. At least, I assume it’s the following page (because there is a letter of that date on the next page). I can’t see what else that might indicate.
This letter has text written in the margin, rotated 90°. There’s an OCR adventure.
The text is not necessarily straight. That is, in the manner of handwriting, it often slopes across the page. We’ll need a straightening algorithm that can deal with text that bends, as opposed to simply being un-straight.
Here’s a weird one. It’s a land contract, printed in some ridiculous gothic typeface and a script typeface. It looks like an early form—it has huge blank areas, where text has been handwritten in. Ah, yeah, there are a bunch of these.
Opens with pages of typewritten material explaining the process of gathering and photographing the contents of the reel, a timeline of Monroe's life, and some brief biographical notes.
These letters are sideways. We're going to need to ID sideways pages and rotate them. (How we'll figure out in which direction to rotate them, I have no idea.)
Many of the pages are splotched with dark text, like clouds over the material, making big portions impossible to read.
Some letter-writers hyphenated when they reach the end of a line. (I've never seen that in handwritten materials.)
Monroe ends a page with the same word that he starts the next page with. When the pages are put together, the text appears to say "I have received a copy of the report of the joint joint committee of both houses." This, as I recall, was a practice used in the 1700s by Benjamin Franklin, among others. Anyhow, we'll probably need to look for this and deal with it in transcription. Whether "dealing with it" means removing the duplicate or retaining it, I leave to others to determine. In the following example, the word "trust" is repeated.
Monroe's communications all run together, with one letter ending and another starting on the same page. What's going on here?
"Common Place Bks., Law Notebks., Campaign Notebks."
He’s practicing writing his name over and over and over again. These aren’t just “notebooks”—this has to be from when he was just in grade school.
Page sizes are small, apparently from a journal.
Some of these are photographed two-up—both the left and right pages are photographed as a single image, side-by-side.
These appear to be school assignments.
Page after page of handwritten documents, all in Hayes’ handwriting.
Some of these photographs are of pages of journals that are wider than they are tall(, yielding a two-column page that’s easily three times as wide as it is tall.
Some of these pages have been annotated, such as writing “1838” in the upper-right-hand corner in handwriting that really doesn’t look like Hayes’.
Towards the end of the reel, there’s an image of an index card, on which somebody has written, in marker, “<—— ERROR | CORRECTION FOLLOWS ——>”. The pages do appear to repeat on either side of that marker. This is the sort of indicator that is useful to humans, but of no value to software.
Hey, here’s a transcript. Somebody’s actually typed out a transcript of a page of hard-to-read text and and include it after said page. Huh.
Whoa, one of these pages is really tall and skinny. It overlays the RB Hayes’ Library’s own footer.
Here’s a three-up page. One piece of paper, turned sideways, divided into three columns.
Six-up! We have a new record. This is a piece of paper that appears to have been folded in half lengthwise, and in thirds widthwise, and then each of those six portions written on as if it was its own page. You’re killing me, R.B.
Idea: The evolution of a president’s signature. Capture an image of it every year for, like, 40 years. Turn it into an anigif.
These are all journal pages, scanned two-up usually, but sometimes just one page at a time.
The binding must be coming loose, because the pages can be seen to be splayed out around the edges of each scan.
This is all handwritten. Some of it is tabular data, such as when he's doing basic accounting.
This page has a pasted-in ad for a Rockwood photography studio:
Whoa, here's a page with a photograph pasted in. It's a circular picture, a portrait of a man who certainly looks like Rutherford B. Hayes. A quick search does not reveal this photo anywhere. It's from 1869. Perhaps related to the Rockwood ad?
This journal opens with a stamped-on message: "For Congress, 2nd District, RUTHERFORD B. HAYES."
He also opens this journal with the height and weight of his children on September 13, 1873. (They range from 5'7 3/8" to 3'4"—"in stocking feet"—and 161 pounds to 44 pounds.) The first two pages are given over to repeated measurements through January 1 1874.
Wow, this journal starts with him reflecting on his life thus far. He's been married 20 years to the day, he's about to step down from his four-year term as governor, and it's been 13 years since he was first elected City Solicitor. He laments that "my salaries have not equalled my expenses." He says that he is "quit[ting] public life," that "the flower of the U.S. Senator has seemed to be within my reach," but that "I do not care enough for it to go into a struggle for it." "I tell my friends to look elsewhere for a candidate."
He's moved to a lined notebook.
This page has the right half taken up by a newspaper clipping. It's entitled "Non-Sectarian Schools," and is a letter from "Ex-Speaker Blaine."
Here's a letter from him to "Hon. L.C. Jones," informing him that Hayes is nominating him for a judicial commission. "A true copy, Webb O. Hayes" is written at the bottom, and it's not in Hayes' handwriting.
Here's another copy of a letter. In fact, they just keep coming. Huh. And in the next journal, too.
Page bleed. Not awful, but problematic.
This is weird. An index card with typed text has been laid down atop the journal and photographed like that. The typed text is at the bottom of the page, the handwritten equivalent is at the top, and they are separated by a paragraph.
Here's a seating diagram for a December 30, 1877 event (when Hayes is president), for which he's got him and his wife at one end of the table, his son Scott at the other end, and all manner of people between them at the table. It looks like maybe 20 people will be seated.
And another seating chart. Another and another. Huh.
Here's a paste of some printed text, surrounded by a decorative border, about serving a meal in honor of "Mrs. President Hayes."
Look through sample microfilm from all three presidents and propose types of metadata to be extracted and how to extract that metadata.