At present, we have our 20 GB of law enforcement manuals stored on the pre-assembly server, for later accessioning when we have time. But even before the manuals are accessioned, we'd like to do some analysis and exploration on the data, and we'd prefer not to do that work on the pre-assembly server (for many reasons, including limiting unnecessary access to that server, as well as for continuing the pattern of spinning up VMs for specific usages instead of allowing some VMs to become catch-all places for activity without a more clearly defined home). We may also want to use this VM as a place to perform more web scraping (which is what was done to collect the law enforcement manuals we have so far.
Since we've been using Python for the scraping, and since we intend to use it for our indexing experimentation (see https://github.com/sul-dlss-labs/ksr/issues/20), we'd like to have Python 3 installed. We'll likely try using Intake for pushing data into Solr at some point, but I believe that can be treated like any other Python dependency?
I could've sworn there was already a ticket for this, but I couldn't find it. Happy to close one as a duplicate if there is another similar ticket already.
At present, we have our 20 GB of law enforcement manuals stored on the pre-assembly server, for later accessioning when we have time. But even before the manuals are accessioned, we'd like to do some analysis and exploration on the data, and we'd prefer not to do that work on the pre-assembly server (for many reasons, including limiting unnecessary access to that server, as well as for continuing the pattern of spinning up VMs for specific usages instead of allowing some VMs to become catch-all places for activity without a more clearly defined home). We may also want to use this VM as a place to perform more web scraping (which is what was done to collect the law enforcement manuals we have so far.
Since we've been using Python for the scraping, and since we intend to use it for our indexing experimentation (see https://github.com/sul-dlss-labs/ksr/issues/20), we'd like to have Python 3 installed. We'll likely try using Intake for pushing data into Solr at some point, but I believe that can be treated like any other Python dependency?