Closed ZeeOneOtter closed 1 month ago
IndalekoICloudIndexer.py - This script was modelled from the DropboxIndexer script. It allows the use of command lines same as the DropboxIndexer. The function called 'collect_metadata' needs work as in DEBUG log file I can see information such as 'parentId' but when I try to use the getattr(), it returns the default value of 'Unknown' but for other variables like 'type' it correctly returns 'folder', 'file', or 'app_library' (if it is an application'. The debug information in the log files created when running show a plethora of information could be accessed but somehow python is limited in getting this information in some instances while not in others.
IndalekoICloudIngestor.py - This script was similarly modelled after the DropBoxIngester script. Big modifications were made to exclude the use of UnixFileAttributes and WindowsFileAttributes. Because of how python sometimes can or can't scrape the correct data, the limited scope means that I had to customize the attributes. This script appears to run without issue. However, when I try running it nothing shows up in the ArangoDB.... I am wondering if it is because the indexed data I am feeding into the ingestor is the root directory - so the parentId field is giving me a WARNING in the log file.
Going forward I am going to run my Indexer script on my entire iCloud Drive again, since I am attempting to collect more information. (This time I also updated the script from the FullMain one to also include indexing if an item is a folder. Whereas before if it was a folder it would skip the item) Then I will try running the ICloudIngestor script to see if it will properly import into Arango the items that do have parentIds (aka items further down from the root directory).
If this should still be merged, you'll have to figure out the conflicts and make sure it still works, then update the PR accordingly.
Indaleko_iCloudSecureCreds.py - handles the login and authentication of the user. Able to handle multiple users now though will only log into one user. Note the script could run on its own if one calls the authenticate funciton but there wouldn't be much use. This script now also makes a log file doing basic logging of the authentication. I will note that it is lacking some of the details that the future script listed below has.
Indaleko_iCloudSecure_future.py is a script that only authenticates but for some reason the log file it produces is much more indepth including things like https calls, tokens, various checks on modules, etc... not sure why yet. But
Indaleko_iCloudTopLevel.py - handles indexing (limited fields) the opening directory of iCloud. This is mainly a proof of concept I keep updating and working on as I use it as a trial run before working on the fullDirectory indexer script. This was to get most functions in working order. It creates a log file of its own
Indaleko_iCloudTopMain.py - this is what to run to index just the opening directory of iCloud. It calls the authenticate funciton described in iCloudSecureCreds. Then it calls the index_to_jsonl function to index the opening directory.
Indaleko_iCloudFullLevel.py - simply to keep with similar name convention if TopLevel. I did FullLevel for FullDirectory. This script outlines the get_folder_contents function which goes into the subdirectories and indexes what it can. All while producing a log file for the indexing.
Indaleko_iCloudFullMain.py - This is the script that will authenticate then index all of iCloud. It will take a bit of time - for me about ~1.5-2hrs. The authentication will produce a log. The indexing will do a separate log. And there will be an output of a jsonl with timestamp same as the ones of the log files at the start of its name.