morskyjezek / digcur

This repository holds materials for use in learning digital curation tools and concepts from archival and library perspectives. Materials for Master's course at UMD iSchool, 2018 & 2019.
MIT License
3 stars 0 forks source link

Add link to tika for metadata details #1

Open morskyjezek opened 2 years ago

morskyjezek commented 2 years ago

It would be good to add a link to the Tika documentation in the section on metadata extraction. Specifically, link to https://tika.apache.org/2.2.1/parser.html#Document_metadata in the 105a notebook.

See [activities/Python 105b - File Metadata Extraction (Tika Python Test).ipynb](activities/Python 105b - File Metadata Extraction (Tika Python Test).ipynb)

morskyjezek commented 2 years ago

Also consider adding this resource: https://www.geeksforgeeks.org/parsing-pdfs-in-python-with-tika/

morskyjezek commented 2 years ago

also of potential interest, for gathering system information/metadata from Windows files, see this thread https://stackoverflow.com/questions/12521525/reading-metadata-with-python