If I dont do this, I get logs like these and it seems that the file is not being read:
[WARNING 2024-02-08 12:06:05,556 my.discord.data_export data_export.py:135 ] Message index 'index.json' doesn't exist at /home/sean/data/discord/2020_10.zip/messages/index.json
[WARNING 2024-02-08 12:06:05,572 my.discord.data_export data_export.py:135 ] Message index 'index.json' doesn't exist at /home/sean/data/discord/2021_03.zip/messages/index.json
[WARNING 2024-02-08 12:06:05,588 my.discord.data_export data_export.py:135 ] Message index 'index.json' doesn't exist at /home/sean/data/discord/2021_11.zip/messages/index.json
[WARNING 2024-02-08 12:06:05,606 my.discord.data_export data_export.py:135 ] Message index 'index.json' doesn't exist at /home/sean/data/discord/2022_04.zip/messages/index.json
[WARNING 2024-02-08 12:06:05,624 my.discord.data_export data_export.py:135 ] Message index 'index.json' doesn't exist at /home/sean/data/discord/2022_11_27.zip/messages/index.json
I think this probably has to do with the inconsistent directory depth I have on my discord exports (I had unzipped/rezipped some of them while testing things) but it means that match_structure is not being run in get_discord_exports, so it doesnt find the correct files
Just as an FYI, @karlicoss
I added an experimental ._use_zippath config flag for now, which I now have set to False for myself (otherwise it would default to true since guess_compression=True on get_files), so it does the whole match_structure unzip in my tmpdir, but maybe it would be nice to have a global config option for get_files to set guess_compression across everything?
Perhaps explicitly for ZipPath in particular, since that tends to be a directory structure that might cause issues. Things like .gz/.zstd are typically just parsing one compressed file
If I dont do this, I get logs like these and it seems that the file is not being read:
I think this probably has to do with the inconsistent directory depth I have on my discord exports (I had unzipped/rezipped some of them while testing things) but it means that
match_structure
is not being run inget_discord_exports
, so it doesnt find the correct filesJust as an FYI, @karlicoss
I added an experimental
._use_zippath
config flag for now, which I now have set toFalse
for myself (otherwise it would default to true sinceguess_compression=True
onget_files
), so it does the wholematch_structure
unzip in my tmpdir, but maybe it would be nice to have a global config option forget_files
to setguess_compression
across everything?Perhaps explicitly for
ZipPath
in particular, since that tends to be a directory structure that might cause issues. Things like.gz
/.zstd
are typically just parsing one compressed file