Closed parthux1 closed 1 year ago
Locale splitting looks good :+1:
Are there any files I should keep in mind while working?
For __main__.py
, incase it wasnt obvious since the decorator is somewhat complicated, to add the CLI flag, it'd be:
diff --git a/google_takeout_parser/__main__.py b/google_takeout_parser/__main__.py
index d5fe165..31cd5e8 100644
--- a/google_takeout_parser/__main__.py
+++ b/google_takeout_parser/__main__.py
@@ -43,7 +43,7 @@ SHARED = [
# TODO: actually handle argument
click.option(
"-l",
- "--localization",
+ "--locale",
type=click.Choice(["en", "ger"]),
default="en",
help="Used DEFAULT_HANDLER_MAP resoling folder names to parser models",
@@ -110,7 +110,7 @@ def parse(cache: bool, action: str, takeout_dir: str) -> None:
@main.command(short_help="merge multiple takeout directories")
@shared_options
@click.argument("TAKEOUT_DIR", type=click.Path(exists=True), nargs=-1, required=True)
-def merge(cache: bool, action: str, takeout_dir: Sequence[str]) -> None:
+def merge(cache: bool, action: str, locale: str, takeout_dir: Sequence[str]) -> None:
"""
Parse and merge multiple takeout directories
"""
Also I think should rename to --locale
, shorter to type
How should I start adding tests for this?
For the path_handler
you've done what I would've synced a copy of the directory -- good.
Otherwise, I think anywhere where you see differences in the actual structure of code where it leads to differences, you could extract an HTML div or JSON blob like I do in:
I dont think there should be many of those though, but if you're seeing errors there might be a differently structured file for different locales
parsing my-activity files yield a warning eg. "time data '12.07.2019, 14:51:43' does not match format '%b %d, %Y, %I:%M:%S %p"
I think that error in particular is the localizations of the dates when using the html exports (e.g. for the My Activity.html
)
Those are pretty annoying bits of code since google takeout has changed a bunch and we have to account for all the changes (at least in the english locale).
I think those should be from My Acitvity.html
, so unless you want to pass the locale to parse_html_dt
, we could opt to just not include the HTML parsing code for other locales. The code to parse the HTML should be seen as legacy anyways -- as it says in the readme you should pick JSON
for those formats, as it then just returns an UTC epoch timestamp instead of having to handle dealing with locales. Could warn the user to do exactly that if parsing html files with other locales
You should also just make sure none of the other html parsing code is incorrect in a different locale. E.g. does comment.py have differences in german or is the date format the same?
Several change requests marked as resolved are implemented as proposed by you.
continue
to break
. Should be okay now but you may want to take a quick look. See this committests/test_
-files__main__.py
How do you test the command line behavior?😅
When trying to call it via command line
python google_takeout_parser\__main__.py parse [File]
I get errors due to the relative .
imports. Do I have to locally package it in some way?
How do you test the command line behavior
While in the the root directory, run:
pip install .
to install it, and then you can use python3 -m google_takeout_parser
. You could also just run python3 -m google_takeout_parser
without the pip install .
, but I tend to install it just to make sure Im not mistakenly in the wrong directory and using the globally installed google_takeout_parser
and not the local version
PR for Issue #43.
Structure
added folder
locales
containing 1.1 common.py - defines several TakeoutFiles(Enum), which are resolved to a parser later 1.2 en.py, de.py - HandlerMap from path_dispatch moved here. Each module defines such a map and links filepaths to TakeoutFiles from common.pyadded path_handler.py which 2.1 resolves enum
TakeoutFile
to acallable
(prev. happened directly in the HandlerMap) 2.2 groups localized HandlerMaps (en.py, de.py
) into the classLocalizedhandlers
Details following:
Changes to existing code
TakeoutParser now defaults to EN handler. an undefined amount of HandlerMaps can be supplied
removed default-handler from dispatch map. The function now iterates over all supplied handlers
previously a HandlerMap mapped to either
None
orcallable
. Now if you map toTakeoutFile
path_handler.py will parse it to acallable
current problems:
__main__.py
was updated but doesn't handle the parser flag because I have no idea how to debug it. Currently I debug by importing classes as shown in the prev. exampleIs this concept understandable and beneficial? Are there any files I should keep in mind while working? How should I start adding tests for this?