Closed kthrog closed 5 years ago
Added more to the tutorial!
https://docs.google.com/document/d/1CiI1_50Cm9P_KGIVMzE5BaM-dARbTdmrLQXcomGDpaE/edit
I think this is pretty much done. @kthrog Does this fit into the documentation somewhere, maybe the deposit policy? I am not super happy about the Python package I put in the tutorial because I personally could not get it to work but I am sure it might work for someone.
Awesome -- this looks great to me, and yes, I think it will fit nicely at the end of the deposit policy. I'll write something up real quick that adds it in after it states we want machine readable files.
Done!
Now at the end of the full written report: https://docs.google.com/document/d/1vr4HDLDjyFLifBaRI0EmhK3-NsJXGB87UxQ1DTYAxuo/edit#
There is also the capability of using R to do more robust table / text extraction: See blog here: https://ropensci.org/technotes/2018/12/14/pdftools-20/ And gist with some sample R code here: https://gist.github.com/nniiicc/28488e7193277c7f0bc8feb07091a089