Closed qjhart closed 5 years ago
@jrmerz I'm fully committed to get this operational, so let me know if you want any modifcations to the current setup. The current setup draws the library data from csv files, so there shouldn't be any db issues. I could fairly easily redo the script to a per file setup, but I don't think that's a slow-down.
@jcarlen , can you look this branch over and let me know if there is a better way to do this. I'm trying to produce a more complete per-pages csv output method. This looks like the next step after what Justin has coded in the cloud so far. I'm looking into the final script now.
Modified original issue to eliminate absolute path.
The current setup has
run_parse_items.R
running all the parse_items step and consolidating the results. I think it's probably better to run this as a per-page process, for cloud runability. I've checked in a new branchparse_item_cloud
that adds a library to the Dockerfile.To do the least harm, I run this script like:
where /io/sloan-ocr matches the cloud directory. This puts the somewhat strangely named
parse_folder_sample.RDS
file in the direcotry. There are possible better ways to run the single file, but the parseFolder function behaves differently.