mnylc / islandora_multi_importer

This is a flexible, twig based, all cmodel, tabular data to islandora Object importer with optional ZeroMQ processing
GNU General Public License v3.0
16 stars 15 forks source link

Can we write a new copy of the CSV file (or better still back into a Google Sheet) with PID's added? #19

Open McFateM opened 7 years ago

McFateM commented 7 years ago

This would help us modify and repeat an errant ingest of a row and preserve the import history. In the ICG's CSV importer we introduce a special column prepended to the CSV that holds a timestamp indicating when the file was last ingested along with the PIDs of the objects that were created during that ingest. All of this information is prefixed with a # so that it appears as a comment if/when the CSV is subsequently re-imported.

DiegoPino commented 7 years ago

We would need probably a config form at admin/islandora/tools to integrate credentials. Or do you want to make open auth part of this?

McFateM commented 7 years ago

Good question. I was working in XLSX sheets in the ICG CSV importer and never got to the point of implementing in Google Sheets.

Could we perhaps aim for an easier target... Could IMI accept a Google Sheet (or other spreadsheet) and translate it internally into CSV for ingest, then write PIDs and history back to that CSV 'copy' and save the resulting file instead of attempting to write into the Google Doc? This is basically what the ICG CSV Importer does, and the CSV file that it writes is given a name that includes a timestamp to make it unique. That same CSV file is in a form that can be re-imported too.

McFateM commented 7 years ago

Tweaking this issue just a bit...

I really like IMI, but the weak link in my IMI workflow appears to be the CSV file itself. At GC we like to work in a Google Sheets environment so that we can easily collaborate on our data, and Google Sheets makes it easy to maintain a single, controlled copy of that data. But exporting it to CSV for import frequently breaks that control.

So I've been wishing that IMI could read (and maybe write) directly from (to) a Google Sheet. It's not rocket science to do so, but it does introduce some sticky requirements...like being at the mercy of Google and any future API changes they might make. Still, I think it is a feature/enhancement worth addressing. I've briefly been researching Google Sheets API issues and found lots of solutions, but none better (so far) than https://www.twilio.com/blog/2017/03/google-spreadsheets-and-php.html. In my opinion it is concise, well-written and recent!

Anybody else interested in such an enhancement? Anyone know of a PHP library that already gets me a littler farther down this road? I might just begin chipping away at the code tomorrow, and if I do I'll try to keep this thread up-to-date regarding progress.

DiegoPino commented 7 years ago

@McFateM, don't want to step on your toes on this, but I have this request already solved and in local test running, merging it once I have it fully tested, probably this weeked, because i do have my hands full on other developments. Also, multi importer, the published version runs directly on any spreadsheet format that excel runs, not only csv and tsv.

McFateM commented 7 years ago

Absolutely not a problem Diego! You may well have saved me a lot of headaches. I'd love to assist you if I can. Any way for you to easily share that local version (don't want to make extra work for you) so that I might assist with testing?