ubc-systopia / Indaleko

Indaleko Project
GNU Affero General Public License v3.0
0 stars 1 forks source link

Adjusted Files and added Unstructured #54

Closed ZeeOneOtter closed 3 months ago

ZeeOneOtter commented 3 months ago

So I adjusted the MacHardwareInfoGenerator script as it would not work for me otherwise.

I adjusted the .gitignore file to set the stage for the iCloud scripts which I need to adjust the names and the flow a bit - specifically creds.py.

The other files are the dockerfile needed for when you boot up unstructured. The IndalekoUnstructuredMetadata contains the functions to run Unstructured The docker_setup.py is the script to run IndalekoUnstructured Metadata. It executes the various docker commands that are needed in conjunction with the IndalekoUnstructuredMetadata.

The three files work in concert. One simply needs to run the docker_setup.py script for the program to execute. The other two files - dockerfile, and IndalekoUnstructuredMetadata are for docker_setup to use.

This is currently a proof of concept. Work will need to be done to adjust the IndalekoUnstructuredMetadata and perhaps a bit of the docker_setup so the mounting of the directories needed in the full version are properly done. At the moment, this program will use Unstructured python module in a docker container to output logging files of the docker itself (if everything about the docker images, containers and volumes are functioning) and a logging file for the execution of the IndalekoUnstructuredMetadata script - letting one know when and where it is on its extraction of metadata and recording into a json file. Then finally the output file of 'extracted_data4.json' which contains the metadata Unstructured has pulled from the files it could index.

Part of what needs to be reworked it the naming to match that of the rest of the Indaleko framework for naming the indexing files that are created by Indexer scripts. Also, I lost the script that had the uuids pulled from the original index file created by the IndalekoMacIndexer script, so currently the uuids are the ones that Unstructured creates for files. So that needs to be corrected to use the same uuids from the other indexed files.