Open waldoj opened 11 years ago
Image density. That is, what percentage of the pixels are white, and what percentage are non-white? I forecast that we'll find that the pages in a given letter tend have the same density. But I also worry that the range won't be great enough to be able to use that information to know where one document stops and another one starts.
It's worth trying this with a histogram, too. Obviously, that's more complicated than a simple black/white calculation.
Overlaid metadata. Some page scans include things like page numbers in the bottom corner, or origin labels in the top. This is important to include, not least of which because it allows us to match up scans with finding aids.
To get started: