Open morskyjezek opened 2 years ago
Also consider adding this resource: https://www.geeksforgeeks.org/parsing-pdfs-in-python-with-tika/
also of potential interest, for gathering system information/metadata from Windows files, see this thread https://stackoverflow.com/questions/12521525/reading-metadata-with-python
It would be good to add a link to the Tika documentation in the section on metadata extraction. Specifically, link to https://tika.apache.org/2.2.1/parser.html#Document_metadata in the 105a notebook.
See [activities/Python 105b - File Metadata Extraction (Tika Python Test).ipynb](activities/Python 105b - File Metadata Extraction (Tika Python Test).ipynb)