Tool needs update for v4

Here is some similar work completed by Will Jobs in v4: https://github.com/willjobs/regulations-public-comments

With with Will's permission, I'm including a bit of our email exchange from 5/25/21. Will wrote: "As I’m sure you know, the annoying thing with the way comments are set up is that they’re associated with a document, and a docket can contain many documents. You may have also gathered from my blog posts that The way the new API is set up, you can’t query for all comments on a given docket, and if you want all comments on a given document, you have to know the document’s “objectId”, which is different from the public-facing documentId. So the order of operations is: use docket to look up associated documents, then use each document’s objectId to get its comments. There’s an extra step after that, too, because when you query for the comments on a document, you get some metadata (I call it “header” information). To get the actual text of the comment (and more detailed info), you have to then access each comment individually, one at a time.

In addition, there’s an annoying pagination “feature” that the API uses, which gives you up to 250 items per page (request), and up to 20 pages per query. If your query returns more than 250x20 = 5000 items, you have to manually deal with it by first sorting your queries by lastModifiedDate, then after 20 pages, filtering the next query by lastModifiedDate >= max(lastModifiedDate) from the previous query."

See Will's readme text for more information.

upenndigitalscholarship / regulations-gov-comment-scraper

Tool needs update for v4 #4