Closed parthux1 closed 9 months ago
Oh, interesting
Yeah, theres no way to do that currently. Could you post an example of the folder structure here?
Would probably just be an constructor arg to the TakeoutParser
class, with a option in the shared options in __main__.py
to let the user specify
I think a flag makes sense incase the user wants to specify, but I can add the automatic detection afterwards, have a good idea on how to do that.
PR would be appreciated, thanks; let me know if you have any questions/issues
@karlicoss as an FYI, this could break the EXPECTED
in google.takeout.parser
https://github.com/karlicoss/HPI/blob/8288032b1c185bda2ddae6b3a956e87d43314604/my/google/takeout/parser.py#L70-L76, as it searches for the english names
As a quick fix could see letting the user override the EXPECTED
for match_structure
with their config, will create a PR to HPI once this has settled
! Currently running a bigger exploit. Will update once it's finished.
Comparing to your testdata
:
non mentioned english dirs/files are similar for german localization
English | German |
---|---|
Chrome | |
Google Play Store | |
Location History | Standortverlauf |
My Activity | Meine Aktivitäten |
Youtube and Youtube Music | |
archive_browser.html | Archiv_Übersicht.html |
English | German |
---|---|
Ads | Anzeigen |
Google Analytics | ? |
Google Apps | Google Spiele |
Google Cloud | ? |
Google Translate | Google Übersetzer |
Help | Hilfe |
Image Search | Bildersuche |
Podcasts | ? |
Video Search | Videosuche |
in each subfolder: MyActivity.json |
MeineAktivitäten.json |
(probably not up-to-date in testdata
)
English | German |
---|---|
Assistant | Google Assistant |
Developers | Google Developers |
News | Google News |
Search | Google Suche |
Great, thanks, looks different enough that detection shouldnt be a problem :+1:
Feel free to restructure path_handler
as you see fit, perhaps could add a locales/de.py
folder/file, following ISO 3166
And then at the top of path_handler.py
can import from .locales.de import HANDLER_MAP as GERMAN_HANDLER_MAP
.
May want to update the setup.py
file as well to ensure that subpackage is included:
diff --git a/setup.py b/setup.py
index d459510..e8bccfd 100644
--- a/setup.py
+++ b/setup.py
@@ -18,7 +18,7 @@ setup(
long_description_content_type="text/markdown",
license="MIT",
packages=find_packages(
- include=["google_takeout_parser", "google_takeout_parser.parse_html"]
+ include=["google_takeout_parser", "google_takeout_parser.parse_html, google_takeout_parser.locales"]
),
install_requires=reqs,
package_data={pkg: ["py.typed"]},
I'll do an export myself just to confirm the current names for me
Just did a new export, for reference if you wanted to compare:
.
├── archive_browser.html
├── Chrome
│ ├── Autofill.json
│ ├── Bookmarks.html
│ ├── BrowserHistory.json
│ ├── Device Information.json
│ ├── Dictionary.csv
│ ├── Extensions.json
│ ├── Omnibox.json
│ ├── OS Settings.json
│ ├── SearchEngines.json
│ └── SyncSettings.json
├── Google Play Store
│ ├── Devices.json
│ ├── Installs.json
│ ├── Library.json
│ ├── Play Settings.json
│ ├── Purchase History.json
│ └── Reviews.json
├── Location History
│ ├── Records.json
│ ├── Semantic Location History
│ │ └── 2023
│ │ ├── 2023_FEBRUARY.json
│ │ ├── 2023_JANUARY.json
│ │ └── 2023_MARCH.json
│ └── Settings.json
├── My Activity
│ ├── Ads
│ │ └── MyActivity.json
│ ├── Android
│ │ └── MyActivity.json
│ ├── Assistant
│ │ └── MyActivity.json
│ ├── Books
│ │ └── MyActivity.json
│ ├── Developers
│ │ └── MyActivity.json
│ ├── Discover
│ │ └── MyActivity.json
│ ├── Drive
│ │ └── MyActivity.json
│ ├── Gmail
│ │ └── MyActivity.json
│ ├── Google Analytics
│ │ └── MyActivity.json
│ ├── Google Arts _ Culture
│ │ └── MyActivity.json
│ ├── Google Cloud
│ │ └── MyActivity.json
│ ├── Google Lens
│ │ └── MyActivity.json
│ ├── Google Play Movies _ TV
│ │ └── MyActivity.json
│ ├── Google Play Store
│ │ └── MyActivity.json
│ ├── Google Store
│ │ └── MyActivity.json
│ ├── Google Translate
│ │ └── MyActivity.json
│ ├── Help
│ │ └── MyActivity.json
│ ├── Image Search
│ │ └── MyActivity.json
│ ├── Maps
│ │ └── MyActivity.json
│ ├── News
│ │ └── MyActivity.json
│ ├── Podcasts
│ │ └── MyActivity.json
│ ├── Search
│ │ └── MyActivity.json
│ ├── Shopping
│ │ └── MyActivity.json
│ ├── Takeout
│ │ └── MyActivity.json
│ ├── Video Search
│ │ └── MyActivity.json
│ └── YouTube
│ └── MyActivity.json
└── YouTube and YouTube Music
├── history
│ ├── search-history.json
│ └── watch-history.json
├── my-comments
│ └── my-comments.html
├── my-live-chat-messages
│ └── my-live-chat-messages.html
├── playlists
│ ├── Favorites.csv
│ └── Liked videos.csv
└── subscriptions
└── subscriptions.csv
39 directories, 55 files
Also just copying the tree containing folders of your export here for reference: The actual mapping according to your current Mapping dict will be part of the PR.
| Archiv_Übersicht.html
|
+---Chrome
| Autofill.json
| Bookmarks.html
| BrowserHistory.json
| Device Information.json
| Dictionary.csv
| Extensions.json
| Omnibox.json
| OS Settings.json
| SearchEngines.json
| SyncSettings.json
|
+---Google Play Store
| Devices.json
| Installs.json
| Library.json
| Order History.json
| Play Settings.json
| Promotion History.json
| Purchase History.json
| Reviews.json
|
+---Standortverlauf
| | Records.json
| | Settings.json
| |
| \---Semantic Location History
| +---2022
| | 2022_DECEMBER.json
| | 2022_NOVEMBER.json
| |
| \---2023
| 2023_FEBRUARY.json
| 2023_JANUARY.json
| 2023_MARCH.json
|
+---Meine Aktivitäten
| +---Android
| | MeineAktivitäten.html
| |
| +---Anzeigen
| | MeineAktivitäten.html
| |
| +---Assistant Memory
| | MeineAktivitäten.html
| |
| +---Bildersuche
| | MeineAktivitäten.html
| |
| +---Books
| | MeineAktivitäten.html
| |
| +---Chrome
| | MeineAktivitäten.html
| |
| +---Datenexport
| | MeineAktivitäten.html
| |
| +---Discover
| | MeineAktivitäten.html
| |
| +---Drive
| | MeineAktivitäten.html
| |
| +---Gmail
| | MeineAktivitäten.html
| |
| +---Google Assistant
| | MeineAktivitäten.html
| |
| +---Google Developers
| | MeineAktivitäten.html
| |
| +---Google Lens
| | MeineAktivitäten.html
| |
| +---Google News
| | MeineAktivitäten.html
| |
| +---Google Play Filme _ Serien
| | MeineAktivitäten.html
| |
| +---Google Play Spiele
| | MeineAktivitäten.html
| |
| +---Google Play Store
| | MeineAktivitäten.html
| |
| +---Google Store
| | MeineAktivitäten.html
| |
| +---Google Suche
| | MeineAktivitäten.html
| |
| +---Google Übersetzer
| | MeineAktivitäten.html
| |
| +---Hilfe
| | MeineAktivitäten.html
| |
| +---Maps
| | MeineAktivitäten.html
| |
| +---Shopping
| | MeineAktivitäten.html
| |
| +---Videosuche
| | MeineAktivitäten.html
| |
| \---YouTube
| MeineAktivitäten.html
|
\---YouTube und YouTube Music
+---Abos
| Abos.csv
|
+---Meine Kommentare
| Meine Kommentare.html
|
+---meine-live-chat-nachrichten
| meine-live-chat-nachrichten.html
|
+---musik-mediathek-songs
| musik-mediathek-songs.csv
|
+---Playlists
| [names of playlist].csv
| Uploads from [channel name].csv
|
+---Verlauf
| Suchverlauf.html
| Wiedergabeverlauf.html
|
\---Videos
[name of uploaded video].mp4
Video-Metadaten.csv
If you could clone and test the DE locale that would be great. I did an export myself with a secondary google account but i dont have as much data:
git clone https://github.com/seanbreckenridge/google_takeout_parser
cd google_takeout_parser
python3 -m pip install .
python3 -m google_takeout_parser --verbose parse Takeout -a summary
It should guess that its de
based on the files present, but if it doesnt you can specify --locale DE
Thanks for your progress on this issue. I'm sorry for responding after several months but it looks like git didn't inform me about updates on this thread. I'm subscribed to this issue but the next tine you may throw a @parthux1 in your message.
Referencing the log it looks like some folder names changed in the meantime.
I tried editing locales/de.py
locally to reflect these changes but after using pips --force-reinstall
and installing the module with these changes into a new venv my dict changes weren't used (the same output was generated).
locales/de.py
- r"Standortverlauf/Semantic Location History/.*/.*.json": _parse_semantic_location_history,
+ r"Location History (Timeline)/Semantic Location History/.*/.*.json": _parse_semantic_location_history,
❯ google_takeout_parser --verbose parse Takeout -a summary --locale DE
[D 231228 16:02:54 path_dispatch:200] User specified locale: DE
[D 231228 16:02:54 path_dispatch:203] Using locale DE. To override set, GOOGLE_TAKEOUT_PARSER_LOCALE
[D 231228 16:02:54 path_dispatch:248] Trying to match one of: ['Chrome', 'Location History', 'Meine Aktivitäten', 'My Activity', 'Standortverlauf', 'YouTube( and YouTube Music)?', 'YouTube( und YouTube Music)?']
[D 231228 16:02:54 path_dispatch:256] Matched expected directory: Location History
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Records.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2022/2022_DECEMBER.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2022/2022_NOVEMBER.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_APRIL.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_AUGUST.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_DECEMBER.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_FEBRUARY.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_JANUARY.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_JULY.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_JUNE.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_MARCH.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_MAY.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_NOVEMBER.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_OCTOBER.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Semantic Location History/2023/2023_SEPTEMBER.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Settings.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Location History (Timeline)/Timeline Edits.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Maps (Meine Orte)/Bewertungen.json
[W 231228 16:02:54 path_dispatch:331] No function to handle parsing Maps (Meine Orte)/Gespeicherte Orte.json
Counter()
Ah thanks :+1: @parthux1
If there are any changes from your end once you have the new export, feel free to make a PR and update the de.py
file
problem
If an account doesn't have english as its main language, folders and some files are named differently (localized). This results into no parsed folders due to _match_handler misses.
possible solution
I think it would be beneficial to add default handler maps like defined here (DEFAULT_HANDLER_MAP) for other languages.
An user could select a HandlerMap via command line argument.
If you approve this idea I could work on a pull request for adding a german localization as well as a command line argument for selecting a handler map.
If there's already an option to achieve this please fill me in :)