tcgoetz / GarminDB

Download and parse data from Garmin Connect or a Garmin watch, FitBit CSV, and MS Health CSV files into and analyze data in Sqlite serverless databases with Jupyter notebooks.
GNU General Public License v2.0
1.18k stars 142 forks source link

Improve login by migrating Cloudscraper to Garth #192

Closed matin closed 1 year ago

matin commented 1 year ago

Is your feature request related to a problem? Please describe.

The current login method uses the same interface as the website—resulting in a need for Cloudscraper and a short-lived session.

Describe the solution you'd like

Garth uses the same interface as the mobile app (instead of the website). The login uses OAuth1 credentials and an automatically renewed Bearer token with requests directly to connectapi.garmin.com. The login session lasts for a year (including with the MFA token).

Garth also supports garmin.cn as the base domain, which would close #167.

Describe alternatives you've considered

If you agree with making the migration from Cloudscraper with website login to Garth and API login, I can issue a PR with the changes.

Additional context

Take a look at the PR for garminconnect as a reference for what the migration would look like.

I currently use Garth on Google Colab and save the session in Drive to avoid needing to log in each time.

Let me know. Happy to write the code for the migration.

tcgoetz commented 1 year ago

I've been pretty busy. I will find time to look at this soon.

tcgoetz commented 1 year ago

This looks like a big change. It won't just change login, it will also change how data is fetched. There are a bunch of questions to be answered, like:

So, I think the best way to handle this is to create a feature branch off of the develop branch and try it. Work any issues there, and then merge to develop and try to get more people to test it there.

Are you up for creating a pull request to a feature branch with a first pass on Garth support?

matin commented 1 year ago
  • how should the login token be stored on different platforms?

We only need to store the OAuth1 token as a JSON-serialized dict. The OAuth1 token is valid for a year and can be exchanged for the OAuth2 Bearer token at anytime automatically.

  • How do we maintain backward compatibility for fetched data files?

connectapi.garmin.com should provide access to the same data as the website—with the exception of supporting a longer living auth session.

  • Can all of the data that is currently being fetched, be fetched with Garth?

I've already tested all of the endpoints for python-garminconnect, which is fairly extensive. I've also tested uploading FIT files. I still need to test downloading FIT files.

I'm the maintainer of Garth, so I can always add any missing functionality. My main goal is to make tools developed for analyzing Garmin data able to function using a long-lived session to avoid repeated logins and getting blocked.

Personally, I use MFA with Garmin, which makes it impractical to use code that requires a repeated login. Given the amount of personal information stored in Garmin, I prefer to keep MFA active.

Are you up for creating a pull request to a feature branch with a first pass on Garth support?

Happy to. I'll send you updates to make sure I'm moving in a direction that makes sense. Is it fair to rely on the existing tests, or are there any manual testing you perform during development as well?


It's true that the method to fetch data changes, which requires other code changes, but it's worth it IMHO. I have code that using Garth to make 1,000+ requests with 10 concurrent threads without any throttling, and I've never been blocked. I have a saved session on my computer and one stored in Google Drive for Colab. I haven't had to login in again in weeks and won't need to again for another 11 months. It's as close as we can get to using the official Connect API.

I'll get working on an initial migration example to give you an idea of what it could look like.

matin commented 1 year ago

I just confirmed downloading a FIT file on connectapi.garmin.com works just like connect.garmin.com (just with different auth).

tcgoetz commented 1 year ago
  • how should the login token be stored on different platforms?

We only need to store the OAuth1 token as a JSON-serialized dict. The OAuth1 token is valid for a year and can be exchanged for the OAuth2 Bearer token at anytime automatically.

But the token should be considered a secret. Shouldn't the token be stored in the keychain on MacOs and Linux and whatever the equivalent is on Windows?

  • How do we maintain backward compatibility for fetched data files?

connectapi.garmin.com should provide access to the same data as the website—with the exception of supporting a longer living auth session.

As long as the JSON blob has the same structure and naming no conversion will be needed. Otherwise conversion will be needed for backwards compatibility.

  • Can all of the data that is currently being fetched, be fetched with Garth?

I've already tested all of the endpoints for python-garminconnect, which is fairly extensive. I've also tested uploading FIT files. I still need to test downloading FIT files.

+1

I'm the maintainer of Garth, so I can always add any missing functionality. My main goal is to make tools developed for analyzing Garmin data able to function using a long-lived session to avoid repeated logins and getting blocked.

Personally, I use MFA with Garmin, which makes it impractical to use code that requires a repeated login. Given the amount of personal information stored in Garmin, I prefer to keep MFA active.

+1

Are you up for creating a pull request to a feature branch with a first pass on Garth support?

Happy to. I'll send you updates to make sure I'm moving in a direction that makes sense. Is it fair to rely on the existing tests, or are there any manual testing you perform during development as well?

It's true that the method to fetch data changes, which requires other code changes, but it's worth it IMHO. I have code that using Garth to make 1,000+ requests with 10 concurrent threads without any throttling, and I've never been blocked. I have a saved session on my computer and one stored in Google Drive for Colab. I haven't had to login in again in weeks and won't need to again for another 11 months. It's as close as we can get to using the official Connect API.

I'll get working on an initial migration example to give you an idea of what it could look like.

+1

I created a new branch off of the develop branch called garth-migration. Please target your PR against it.

tcgoetz commented 1 year ago

BTW, it would be nice to port the Jupyter notebooks you have for Garth to run against garmindb.

matin commented 1 year ago

But the token should be considered a secret. Shouldn't the token be stored in the keychain on MacOs and Linux and whatever the equivalent is on Windows?

I'll take a look at different approaches.

As long as the JSON blob has the same structure and naming no conversion will be needed. Otherwise conversion will be needed for backwards compatibility.

It's identical.

I created a new branch off of the develop branch called garth-migration. Please target your PR against it.

Sounds good. I'll work off of that branch.

BTW, it would be nice to port the Jupyter notebooks you have for Garth to run against garmindb.

That was part of the original goal.

matin commented 1 year ago

But the token should be considered a secret. Shouldn't the token be stored in the keychain on MacOs and Linux and whatever the equivalent is on Windows?

There's a specific issue with macOS Keychain that either removes the security benefits of using it or makes it painful to use. The program with permission is Python. That means that any other script using the same Python executable also has access without permissions.

The alternative is to require the macOS user password on each read. That's certainly better than logging into Garmin each time since it's a local operation, but it's still not a great experience.

I'll work on the migration with the existing model of saving the tokens to FS. This is necessary to maintain as an option for people like me that primarily use Google Colab and need to store the tokens in Drive.

Even without saving the tokens, Garth's model is still better than using Cloudscraper. I've logged in tens of times in the same hour during initial development and never got blocked.

matin commented 1 year ago

@tcgoetz I'm running into issues getting the tests to run. I was able to get make setup to run successfully.

I can make changes to the tests, but I want to check with you first—as this would cause the PR to extend beyond the Garth migration.

$PROJECT_BASE is [/Users/matin/code/GarminDB]
$PLATFORM is [Darwin]
$SHELL is [/bin/sh]
$PIP_PATH is [/Users/matin/code/GarminDB/.venv/bin/pip3]
/Library/Developer/CommandLineTools/usr/bin/make -C Fit test
/Library/Developer/CommandLineTools/usr/bin/make -C test
python3 test_fit_fields.py
test_local_timestamp_field_valid_conversion (__main__.TestFitFields) ... ok
test_time_ms_field_valid_conversion (__main__.TestFitFields) ... ok
test_utc_timestamp_field_valid_conversion (__main__.TestFitFields) ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.000s

OK
python3 test_fit_field_enum.py
test_enum_field_unknown_conversion (__main__.TestFitFieldEnum) ... ok
test_enum_field_valid_conversion (__main__.TestFitFieldEnum) ... ok
test_field_enum_fuzzy_metric (__main__.TestFitFieldEnum) ... ok
test_field_enum_fuzzy_statute (__main__.TestFitFieldEnum) ... ok
test_field_enum_unknown_conversion (__main__.TestFitFieldEnum) ... ok
test_field_enum_valid_conversion (__main__.TestFitFieldEnum) ... ok

----------------------------------------------------------------------
Ran 6 tests in 0.000s

OK
python3 test_fit_dependant_field.py
test_product_field_reconvert (__main__.TestFitDependantField) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.000s

OK
python3 test_measurements.py
test_distance (__main__.TestMeasurement) ... ok
test_distance_from_func (__main__.TestMeasurement) ... ok
test_distance_from_func_raises (__main__.TestMeasurement) ... ok
test_speed_from_func (__main__.TestMeasurement) ... ok
test_speed_from_func_raises (__main__.TestMeasurement) ... ok
test_temperature (__main__.TestMeasurement) ... ok
test_weight (__main__.TestMeasurement) ... ok

----------------------------------------------------------------------
Ran 7 tests in 0.000s

OK
python3 test_conversions.py
test_perhour_speed_to_pace (__main__.TestConversions) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.000s

OK
/Library/Developer/CommandLineTools/usr/bin/make -C Tcx test
/Library/Developer/CommandLineTools/usr/bin/make -C test
python3 test_loop.py
test_loop (__main__.TestLoop) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.001s

OK
python3 test_read.py
test_parse_tcx (__main__.TestRead) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.000s

OK
/Library/Developer/CommandLineTools/usr/bin/make -C utilities test
/Library/Developer/CommandLineTools/usr/bin/make -C test
make[2]: Nothing to be done for `all'.
/Library/Developer/CommandLineTools/usr/bin/make -C test all
python3 test_garmin_db.py
db params <DbParams() {'db_type': 'sqlite', 'db_path': '/var/folders/1f/hywnv3mn28gb3ndd3cgw6njr0000gn/T/tmp6o_v7b68/DBs'}
test_db_exists (__main__.TestGarminDb) ... ok
test_db_tables_exists (__main__.TestGarminDb) ... FAIL
test_garmindb_tables_bounds (__main__.TestGarminDb) ... ERROR
test_measurement_system (__main__.TestGarminDb) ... ok
test_not_none_cols (__main__.TestGarminDb) ... ok
test_sleep_import (__main__.TestGarminDb) ... Processing [<FileType.sleep: 49>] FIT data from test_files/fit/sleep
ERROR

======================================================================
ERROR: test_garmindb_tables_bounds (__main__.TestGarminDb)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/matin/code/GarminDB/test/test_garmin_db.py", line 77, in test_garmindb_tables_bounds
    self.check_col_stats(
  File "/Users/matin/code/GarminDB/test/test_garmin_db.py", line 62, in check_col_stats
    self.check_col_stat(col_name + ' max', maximum, max_bounds)
  File "/Users/matin/code/GarminDB/test/test_garmin_db.py", line 51, in check_col_stat
    self.assertGreaterEqual(value, min_value, '%s value %s less than min %s' % (value_name, value, min_value))
  File "/Users/matin/.pyenv/versions/3.10.11/lib/python3.10/unittest/case.py", line 1248, in assertGreaterEqual
    if not a >= b:
TypeError: '>=' not supported between instances of 'NoneType' and 'int'

======================================================================
ERROR: test_sleep_import (__main__.TestGarminDb)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/matin/code/GarminDB/test/test_garmin_db.py", line 125, in test_sleep_import
    gfd = GarminSleepFitData('test_files/fit/sleep', latest=False, measurement_system=self.measurement_system, debug=2)
  File "/Users/matin/code/GarminDB/garmindb/import_monitoring.py", line 93, in __init__
    super().__init__(input_dir, debug, latest, True, [fitfile.FileType.sleep], measurement_system)
  File "/Users/matin/code/GarminDB/garmindb/fit_data.py", line 41, in __init__
    self.file_names = FileProcessor.dir_to_files(input_dir, fitfile.file.name_regex, latest, recursive)
  File "/Users/matin/code/GarminDB/.venv/lib/python3.10/site-packages/idbutils/file_processor.py", line 46, in dir_to_files
    for file in os.listdir(input_dir):
FileNotFoundError: [Errno 2] No such file or directory: 'test_files/fit/sleep'

======================================================================
FAIL: test_db_tables_exists (__main__.TestGarminDb)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/matin/code/GarminDB/test/test_db_base.py", line 46, in test_db_tables_exists
    self.check_db_tables_exists(self.db, self.table_dict)
  File "/Users/matin/code/GarminDB/test/test_db_base.py", line 39, in check_db_tables_exists
    self.assertGreaterEqual(table.row_count(db), min_rows, 'table %s has no data' % table_name)
AssertionError: 0 not greater than or equal to 1 : table attributes_table has no data

----------------------------------------------------------------------
Ran 6 tests in 0.055s

FAILED (failures=1, errors=2)
make[1]: *** [garmin_db] Error 1
make: *** [test] Error 2
tcgoetz commented 1 year ago

There's a specific issue with macOS Keychain that either removes the security benefits of using it or makes it painful to use. The program with permission is Python. That means that any other script using the same Python executable also has access without permissions.

I'll deal with MacOs keychain issues.

tcgoetz commented 1 year ago

I can make changes to the tests, but I want to check with you first—as this would cause the PR to extend beyond the Garth migration.

If you do a make clean all test then it should download, import, analyze before testing and the tests should apss. All tests currently pass in develop and master.

tcgoetz commented 1 year ago

FileNotFoundError: [Errno 2] No such file or directory: 'test_files/fit/sleep'

This one seems like it didn't download sleep data. OR the files aren't named with the same pattern.

tcgoetz commented 1 year ago

Actually, all of those errors could be from not downloading all data types.

tcgoetz commented 1 year ago

If you want to push what you have to the garth migration branch, I think I have time to try it tonight.

matin commented 1 year ago

I had to put this off by a week, but I'll definitely get to it after this week.

matin commented 1 year ago

I migrated the login, but to migrate the requests, I need a series of tests I can run—even if it's manual.

@tcgoetz what do you recommend?

To migrate requests, it looks like the most direct way would to migrate idbutils.RestClient.get() and idbutils.RestClient.post(). Does that make sense?

matin commented 1 year ago

As a reference, both garth.Client.get() and garth.Client.post return requests.Response, which should fit the same interface as what's being used now in idbutils.

msiemens commented 1 year ago

to migrate the requests, I need a series of tests I can run—even if it's manual.

I have a script that perodically downloads a bunch of data (including monitoring, steps, itime, sleep, rhr, weight and activities) and just tried it with @matin's changes. With a few more changes it worked. I'll make a PR against matin/GarminDB in the next few days. Though it also needed a few changes in idbutil's rest_client.py to support empty base routes.

matin commented 1 year ago

@msiemens I suggest you submit the PRs directly to @tcgoetz. It's going to be up to him on the approach for the changes.

tcgoetz commented 1 year ago

@matin maybe I missed some things, but I thought the current state of the garth-migration branch was that it wasn't passing all tests and need more work. IS that not the case?

matin commented 1 year ago

I was stuck here: https://github.com/tcgoetz/GarminDB/issues/192#issuecomment-1722294813

but, it looks like @msiemens built on the initial changes I made and figured out how to make it work.

tcgoetz commented 1 year ago

OK, then lets gets his changes added to the garth-migration branch. Then we can move forward.

tcgoetz commented 1 year ago

As far as tests, make test in a source tree runs tests against the local db. So make clean reinstall_all all test installs the local code, downloads, analyzes, and runs tests.

tcgoetz commented 1 year ago

I merged the PRs for utilities and the main repo to garth-migration branch on each. The main repo garth-migration branch now references the grath-migration branch on the utilities submodule.

At this point auth is using garth and is working, but downloading of data is still using cloud scraper and is still broken:

ERROR:root:Exception getting activity summary: <RestCallException() {'inner_exception': HTTPError('404 Client Error: Not Found for url: https://connectapi.garmin.com//activitylist-service/activities/search/activities?start=0&limit=25'), 'error': 'GET https://connectapi.garmin.com//activitylist-service/activities/search/activities?start=0&limit=25 failed (404): <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Not Found</title></head>\n<body>\n <div class="headerContainer">\n <h2 class="page-title">Page Not Found</h2>\n\n <p>We\'re sorry. The page you\'re looking for does not exist.</p>\n </div>\n<script defer src="https://static.cloudflareinsights.com/beacon.min.js/v8b253dfea2ab4077af8c6f58422dfbfd1689876627854" integrity="sha512-bjgnUKX4azu3dLTVtie9u6TKqgx29RBwfj3QXYt5EKfWM/9hPSAI/4qcV5NACjwAo8UtTeWefx6Zq5PHcMm7Tg==" data-cf-beacon=\'{"rayId":"80f5da202aa44d0c","version":"2023.8.0","b":1,"token":"dfcba71ff1d44ca3956104d931b99217","si":100}\' crossorigin="anonymous"></script>\n</body>\n</html>', 'url': 'activitylist-service/activities/search/activities', 'response': <Response [404]>}>

and

ERROR:root:Exception getting daily summary: <RestCallException() {'inner_exception': HTTPError('404 Client Error: Not Found for url: https://connectapi.garmin.com//usersummary-service/usersummary/daily/Tom_Goetz?calendarDate=2023-09-16&_=1694840400000'), 'error': 'GET https://connectapi.garmin.com//usersummary-service/usersummary/daily/Tom_Goetz?calendarDate=2023-09-16&_=1694840400000 failed (404): <html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Not Found</title></head>\n<body>\n <div class="headerContainer">\n <h2 class="page-title">Page Not Found</h2>\n\n <p>We\'re sorry. The page you\'re looking for does not exist.</p>\n </div>\n<script defer src="https://static.cloudflareinsights.com/beacon.min.js/v8b253dfea2ab4077af8c6f58422dfbfd1689876627854" integrity="sha512-bjgnUKX4azu3dLTVtie9u6TKqgx29RBwfj3QXYt5EKfWM/9hPSAI/4qcV5NACjwAo8UtTeWefx6Zq5PHcMm7Tg==" data-cf-beacon=\'{"rayId":"80f5da20ebbb4d0c","version":"2023.8.0","b":1,"token":"dfcba71ff1d44ca3956104d931b99217","si":100}\' crossorigin="anonymous"></script>\n</body>\n</html>', 'url': 'usersummary-service/usersummary/daily/Tom_Goetz', 'response': <Response [404]>}>

So we either need to fix the API URLs while still using cloudscraper to download or transition to using garth for data download. The hitch there is that garth doesn't have all of the data classes that gamindb needs yet

tcgoetz commented 1 year ago

@matin in garth/garth

should

url = f"https://{subdomain}.{self.domain}{path}"

be url = f"https://{subdomain}.{self.domain}/{path}"

because I'm getting

Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x124ddc210>: Failed to resolve 'connectapi.garmin.comusersummary-service

for

self.garth.download(url, params=params)

where url is "usersummary-service/usersummary/daily"

tcgoetz commented 1 year ago

I committed an example of how I think we should switch to use garth for downloading: 22323271fd20d49585fbfdc13f0a2cfbfac5b421

matin commented 1 year ago

@tcgoetz for the issue related to the path needing to start with /, I just addressed it in 0.4.35. Let me know if anything else comes up.

tcgoetz commented 1 year ago

I have it almost working with downloading using garth. There were a couple fileds that were pulled from the login webpage, that I need to find another way of getting:

matin commented 1 year ago

Try /userprofile-service/userprofile/user-settings

tcgoetz commented 1 year ago

How about adding user-settings to garth, the same way as social-profile?

matin commented 1 year ago

Seems reasonable. It'll be a few hours before I can get to it.

tcgoetz commented 1 year ago

The garth-migration branch is now working (auth, download, etc), but is not ready for merge yet.

tcgoetz commented 1 year ago

@matin How about capturing the knowledge on how to download data in garth? Methods that include the URL for a given data type and can also format any required params?

example:

def get_sleep(date):
        params = {
            'date'                  : date.strftime("%Y-%m-%d"),
            'nonSleepBufferMinutes' : 60
        }
        url = f'/wellness-service/wellness/dailySleepData/{self.display_name}'
        return self.connectapi(url, params=params)
matin commented 1 year ago

You can use Charles Proxy. Go to the screen in the Connect mobile app with the info you want to see / operation you want to perform. Charles will show you all the request and response info. Garth uses the same auth model as the app, so you can access the same info / perform the same operations.

I do hope to document the endpoints though: https://github.com/matin/garth/issues/12

tcgoetz commented 1 year ago

https://github.com/tcgoetz/GarminDB/releases/tag/v3.5.0

mbrionesalvarez commented 1 year ago

Does anyone know how to download golf data using garth instead of cloudscraper?

tcgoetz commented 1 year ago

Go to the garmin connect web page where you view the golf data, turn on debugging in your browser (f12 in Firefox), select network, reload the page, and then look through the traces to find the backend commands fetching the data. Use the same URLs and parameters with garth.