sul-dlss-labs / ksr

SRT Website Test
MIT License
0 stars 0 forks source link

work with ops to provision a VM for data processing #26

Closed jmartin-sul closed 3 years ago

jmartin-sul commented 3 years ago

At present, we have our 20 GB of law enforcement manuals stored on the pre-assembly server, for later accessioning when we have time. But even before the manuals are accessioned, we'd like to do some analysis and exploration on the data, and we'd prefer not to do that work on the pre-assembly server (for many reasons, including limiting unnecessary access to that server, as well as for continuing the pattern of spinning up VMs for specific usages instead of allowing some VMs to become catch-all places for activity without a more clearly defined home). We may also want to use this VM as a place to perform more web scraping (which is what was done to collect the law enforcement manuals we have so far.

Since we've been using Python for the scraping, and since we intend to use it for our indexing experimentation (see https://github.com/sul-dlss-labs/ksr/issues/20), we'd like to have Python 3 installed. We'll likely try using Intake for pushing data into Solr at some point, but I believe that can be treated like any other Python dependency?

jmartin-sul commented 3 years ago

I could've sworn there was already a ticket for this, but I couldn't find it. Happy to close one as a duplicate if there is another similar ticket already.

jmartin-sul commented 3 years ago

https://github.com/sul-dlss/operations-tasks/issues/2790

jmartin-sul commented 3 years ago

VM provisioned, connection details provided via slack