karlicoss / promnesia

Another piece of your extended mind
https://beepb00p.xyz/promnesia.html
MIT License
1.73k stars 74 forks source link

sources.takeout: add support for new youtube csv format #436

Closed seanbreckenridge closed 5 months ago

seanbreckenridge commented 5 months ago

google takeout recently changed the format to CSV files for youtube comments, I added support for it to google_takeout_parser a few weeks ago.

I haven't taken a stab at trying to de-dupe comments that exist in the old HTML format and the new CSV one yet, it is on my todos, but I thought it would be good to get this in here so that new people making an export can at least get access to their comments. There might be some duplication but better than erroring or not existing

this is very basic right now, it does not have any error checking, so if the user is on an old version of google_takeout_parser, this will just error. Should I add a warning message in the ImportError reminding them to upgrade? Wasnt sure if that was too much

If theres anything else you think should be changed/added for this, let me know

seanbreckenridge commented 5 months ago

hmm, looks like hypothesis test data may be gone:

  Error: fatal: repository 'https://github.com/judell/Hypothesis.git/' not found
  Error: fatal: clone of 'https://github.com/judell/Hypothesis.git' into submodule path '/home/runner/work/promnesia/promnesia/tests/testdata/hypexport/src/hypexport/Hypothesis' failed
  Failed to clone 'src/hypexport/Hypothesis' a second time, aborting
karlicoss commented 5 months ago

Yeah also just noticed the CI stuff -- fixed here https://github.com/karlicoss/hypexport/commit/b9f1cab00931c4ca2da7b3ff485f11f2beb49031 (has some explanation why I used a submodule in the first place). If you rebase should hopefully all good!

karlicoss commented 5 months ago

And thanks for the change! Haven't seen this data yet I think, but haven't done exports for some months Yeah, I think it's worth making these new imports more defensive, otherwise the whole data source will go down. I would probably try to import new ones separetely, if that fails -- warn/emit exception -- and could also assign the 'new' imports to some dummy class e.g.

class dummy:
    pass

CSVYoutubeLiveChat = dummy

that way the rest of the code with isinstance checks won't need changes

seanbreckenridge commented 5 months ago

yep, gotcha

im a bit busy for the next few days but will get to that when I have some time

seanbreckenridge commented 5 months ago

have not tested on old version yet, but I think something like this should work

will test on old/new versions of google_takeout_parser later and let you know

does look like it at least works on new version:

[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: browser +92
[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: error -2
[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: promnesia_sean.sources.zsh +2
[INFO    2024-03-15 15:03:11 promnesia dump.py:182] database stats changes: takeout +154
[ ~ ] $
karlicoss commented 5 months ago

whoops, forgot to press merge! thanks