purarue / google_takeout_parser

A library/CLI tool to parse data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)
https://pypi.org/project/google-takeout-parser/
MIT License
82 stars 14 forks source link

support Windows separators in path_dispatch #32

Closed karlicoss closed 2 years ago

karlicoss commented 2 years ago

While setting up Windows CI for promnesia, the takeout tests failed and had these in logs:

2022-05-09T21:03:56.5877916Z [INFO    2022-05-09 20:58:14 promnesia extract.py:49] extracting via promnesia.sources.takeout:index ...
[W 220509 20:58:14 path_dispatch:270] No function to handle parsing My Activity\Chrome\MyActivity.html
[W 220509 20:58:14 path_dispatch:270] No function to handle parsing My Activity\Chrome\README

I guess it's because in path_dispatch forward slashes are hardcoded. Perhaps the quickest fix would be to do something like .replace(os.sep, '/') here -- paths in takeout shouldn't have either forward or backwards slashes anyway https://github.com/seanbreckenridge/google_takeout_parser/blob/master/google_takeout_parser/path_dispatch.py#L94

purarue commented 2 years ago

Ah right - must've made a mistake not removing the \ escaping from the path strings after converting them to regex strings.

I think would make sense to leave the maps intact (other than fixing the backslash issues), and then modify the string in the _match_handler function? Think that would be easier to read

Oh, and I think should be the other way round? -- .replace('/', os.sep)

karlicoss commented 2 years ago

Yep, agree, easier to keep maps as simple strings (hence the suggestion to just hack during the regex matching)!

Ah yeah -- could be the other way around depending on whether we want to patch the regex or the path. Path felt easier since means less modifications for regex (who knows how it could get in the way of escaping :) ), but don't mind either way

purarue commented 2 years ago

Could you link me to the windows CI -- might as well add a windows CI here as well so these issues are caught...

karlicoss commented 2 years ago

yeah, the promnesia config it's the most up to date one https://github.com/karlicoss/promnesia/blob/master/.github/workflows/main.yml

On Tue, May 10, 2022, 17:18 seanbreckenridge @.***> wrote:

Could you link me to the windows CI -- might as well add a windows CI here as well so these issues are caught...

— Reply to this email directly, view it on GitHub https://github.com/seanbreckenridge/google_takeout_parser/issues/32#issuecomment-1122607884, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACHEBPGGESNQPVTR7QSXGTVJKD37ANCNFSM5VPRNH5A . You are receiving this because you authored the thread.Message ID: @.***>

purarue commented 2 years ago

should be resolved @karlicoss , let me know if the CI is still failing

karlicoss commented 2 years ago

thanks, the takeout tests are working now! 👍

On Thu, May 12, 2022, 23:45 seanbreckenridge @.***> wrote:

should be resolved @karlicoss https://github.com/karlicoss , let me know if the CI is still failing

— Reply to this email directly, view it on GitHub https://github.com/seanbreckenridge/google_takeout_parser/issues/32#issuecomment-1125484734, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACHEBLYY5HKXGEBKWC2S73VJWCXFANCNFSM5VPRNH5A . You are receiving this because you were mentioned.Message ID: @.***>