usnationalarchives / opengovplan

NARA's Open Government Plan 2016-2018
https://usnationalarchives.github.io/opengovplan/
Other
7 stars 12 forks source link

Section 3 (Flagship Initiatives), Initiative 10 (Digitization) - Need more visibility into digitization partnerships and easier access to the records they produce #1

Open Asparagirl opened 7 years ago

Asparagirl commented 7 years ago

Hi NARA,

Thank you for putting your Open Government Plan up on GitHub and asking for feedback from the public. It's an encouraging sign that the agency is moving in the right direction.

Right now, your "digitized by partners" web page... http://www.archives.gov/digitization/digitized-by-partners.html ...lists which digitization partner has digitized which of your holdings. It's a simple "who did what" list.

But because these items sent out for digitization are usually not available to the general public and the researcher community while that work is in progress, we really need much more frequent updates on what's going on with them, while we wait for access to be restored.

You should add more fields to the table on that page, including but not limited to:

This newly expanded table should also probably become an XLS or CSV dataset on your data.gov data catalog, which frankly looks pretty sparse at the moment.

Ideally, this page would also list a new "coming soon" section, so the researcher community will know what's coming next.

Even a quick perusal through the public comments left on your own blog, NARAtions, will make it clear why these new fields and this increased communication is sorely needed:

https://narations.blogs.archives.gov/2015/07/31/ancestry-com-partnership-agreement-for-public-comment/ and https://narations.blogs.archives.gov/2015/08/27/thank-you-for-feedback-on-renewal-of-the-ancestry-com-partnership-agreement/

Finally, has there been any thought about making either the digitized images, or the newly-created metadata that goes with them, or both, available as a single bulk-download per item, either through the data.gov data catalog or through another method such as a torrent? Or perhaps for sale on a USB drive? That is, right now when the items are finally added to your catalog, they're searchable in the catalog one by one, image by image, but could there be an easy way for the public to get access to the entire set or to the metadata for an entire set?

Thank you for your time and your public service.

mereastew commented 7 years ago

Hi Brooke (@Asparagirl),

Thank you so much for your thoughtful feedback. I have forwarded your comments and suggestions internally and I hope to get a more detailed response to you soon.

Thanks! Meredith Stewart National Archives and Records Administration

clingerman commented 7 years ago

@Asparagirl Thank you for your comments and suggestions. They are definitely on point - we need to do a better job of reporting out to the public what's been digitized by our partners and what's available (i.e. out of embargo).

We are currently in the process of redesigning the page you linked to of records digitized by partners. We're going to add a table at the top that lists which records are now available via the National Archives Catalog with links to those records.

This does not address your ask for some more detailed information. For the longer term, however, we have plans to develop a status tracking tool that shows where each set of records digitized by partners is in the process. This tool should address the majority of your suggestions and we're hoping to have something in late 2017 or 2018.

Bulk download is a feature we're hoping to implement in the National Archives Catalog in 2018. This feature would allow a user to download all objects for a description as a compressed file.

As for accessing the metadata and download URLs for an entire set - that should be available soon. We're currently in the process of enhancing the Catalog API to allow users to retrieve larger and larger datasets and to be able to paginate through even larger datasets than can be downloaded. Stay tuned for our improved API sometime in 2017.

Asparagirl commented 7 years ago

Thank you for your quick, helpful, and specific comments. Yay, responsive government! πŸŽ‰ πŸ—½ πŸ‡ΊπŸ‡Έ

I have a follow-up question along these lines, although to be honest it's not exactly a comment on the Open Gov plan.

For the 1940 U.S. Federal Census, which was released to the public in April 2012, there were at least three digitization partners working with the original microfilms from NARA, and each creating their own metadata/index to the data. I assume that embargo period was for a standard five years, and therefore it should be available to the public pretty soon, in April 2017...?

How soon after the end of the embargo do you think those images will be added back to the NARA Catalog? The turnaround time for the other digitization projects to become available to the public again has been, well, pretty slow, which is even more concerning in this case, given that the amount of data contained in this particular project is so large.

But once everything's finally available in the Catalog, will the public be able to use the forthcoming 2017 API improvements to, let's say, programmatically grab all the digitized 1940 Census images and their associated metadata for a given state or city? Is the public allowed to hotlink to the images in the Catalog from our own websites? What about the metadata that the partners worked on creating, how will the three sets be merged (or not?) and released back to the public? Will this metadata only be integrated tightly with the Catalog or will we be able to grab the file as a single CSV download?

What happens to the 1940 Census mini-site, which is located at a NARA subdomain, http://1940census.archives.gov/ , but apparently run by a for-profit website? Is that mini-site considered another asset created by the digitization project? Based on the URL patterns used on that website, it seems like it is running off its own API; if NARA takes it over, will the public get access to, and documentation for, that API?

Please forgive the highly specific questions, but the 1940 Census is perhaps the highest profile partner digitization project so far, and it's also an enormously large and important data set. How well NARA handles its transition back to public access will be...well, noticed.

It might even be a good choice for you to write a blog post about it, hint hint. πŸ˜‰

Thank you for your time.

clingerman commented 7 years ago

@Asparagirl Thanks for the additional questions and comments. We are hoping to get the 1940 Census records in the National Archives Catalog in the next year.

Thanks also for your suggestions for a bulk download of the data which is something we will explore.

Good idea about the blog post too! We will work on providing answers to your questions and provide additional information in a blog post in the near future.

clingerman commented 7 years ago

@Asparagirl We have made some updates to our web page Microfilm Publications and Original Records Digitized by Our Digitization Partners to indicate what is now available through the National Archives Catalog.

Any of the records that have a link in the "Partner" column will have the National Archives Catalog link in the " NARA Microform Publication Title" column.

Please note that this is a short term solution as we are exploring a more robust way of conveying what records are available through us for free, and where others are in the process. Let us know if you have any suggestions of how we could further communicate the availability of records!

Asparagirl commented 7 years ago

Thank you for the update!

Of all the suggestions I had in my original comment, I think the two that need to be prioritized are the "what's in progress right now" and "what's coming soon" lists on the website, just so everybody kind of has an idea what's being done and what the roadmap is. Neither of those exist yet. Obviously, adding exact dates would be great, but even something as simple as "Scheduled for digitization in second half of 2017: records X and Y with partner Z" would really help, as a stopgap measure.