seanbreckenridge / HPI

Human Programming Interface - a way to unify, access and interact with all of my personal data [my modules]
https://beepb00p.xyz/hpi.html
MIT License
69 stars 6 forks source link

discord: add flag to disable zippath #53

Closed seanbreckenridge closed 5 months ago

seanbreckenridge commented 5 months ago

If I dont do this, I get logs like these and it seems that the file is not being read:

[WARNING 2024-02-08 12:06:05,556 my.discord.data_export data_export.py:135 ] Message index 'index.json' doesn't exist at /home/sean/data/discord/2020_10.zip/messages/index.json
[WARNING 2024-02-08 12:06:05,572 my.discord.data_export data_export.py:135 ] Message index 'index.json' doesn't exist at /home/sean/data/discord/2021_03.zip/messages/index.json
[WARNING 2024-02-08 12:06:05,588 my.discord.data_export data_export.py:135 ] Message index 'index.json' doesn't exist at /home/sean/data/discord/2021_11.zip/messages/index.json
[WARNING 2024-02-08 12:06:05,606 my.discord.data_export data_export.py:135 ] Message index 'index.json' doesn't exist at /home/sean/data/discord/2022_04.zip/messages/index.json
[WARNING 2024-02-08 12:06:05,624 my.discord.data_export data_export.py:135 ] Message index 'index.json' doesn't exist at /home/sean/data/discord/2022_11_27.zip/messages/index.json

I think this probably has to do with the inconsistent directory depth I have on my discord exports (I had unzipped/rezipped some of them while testing things) but it means that match_structure is not being run in get_discord_exports, so it doesnt find the correct files

Just as an FYI, @karlicoss

I added an experimental ._use_zippath config flag for now, which I now have set to False for myself (otherwise it would default to true since guess_compression=True on get_files), so it does the whole match_structure unzip in my tmpdir, but maybe it would be nice to have a global config option for get_files to set guess_compression across everything?

Perhaps explicitly for ZipPath in particular, since that tends to be a directory structure that might cause issues. Things like .gz/.zstd are typically just parsing one compressed file

seanbreckenridge commented 5 months ago

Alternatively, could also add a ZipPath check to match_structure, perhaps that could also work