tcgoetz / GarminDB

Download and parse data from Garmin Connect or a Garmin watch, FitBit CSV, and MS Health CSV files into and analyze data in Sqlite serverless databases with Jupyter notebooks.
GNU General Public License v2.0
1.1k stars 138 forks source link

Activity ID incorrectly parsed from filename #194

Closed bohrium272 closed 10 months ago

bohrium272 commented 12 months ago

Describe the bug I have observed that the activity_id (primary key for the Activities table) obtained using the function File.name_and_id_from_path would only return the year i.e. 2023 for a filename like 2023-01-06-17-05-17.fit.

To Reproduce Steps to reproduce the behavior:

  1. Build the DB using --rebuild_db. I typically use this flag each time I import files from my Forerunner 55.
  2. Check the Activities table in garmin_activities.db
  3. You would see n rows where n is the number of years you have recorded data for. In my case, 2 years for 2022-2023. These would correspond to the first activity in respective years.
    sqlite> SELECT * FROM activities;
    2023|||||3|running|treadmill|||2023-01-06 17:05:17.000000|2023-01-06 17:29:30.000000|00:15:49.935000|00:15:49.935000|....
    2022|||||1|cycling|indoor_cycling|||2022-10-07 07:52:47.000000|2022-10-07 08:06:34.000000|00:09:05.857000|00:09:05.857000|....
    sqlite> SELECT activity_id FROM activities;
    2022
    2023

Expected behavior All activities from each year are imported.

Additional context I believe this issue might be caused by the order in which the regex match occurs in the method File.name_and_id_from_path.

  1. (\d+).*\.\w+ matches 2022-12-29-17-00-32.fit https://regex101.com/r/Uie6Bh/1
  2. (.+)\.\w+ matches 2022-12-29-17-00-32.fit https://regex101.com/r/Rx6cwM/1 But since (1) is matched for first, the activity ID ends up being just 2022.

I created a patch (locally) that changes the matching order ((2) before (1)), essentially using the whole filename as activity_id and I was able to import the correct number of activities as expected.

image

I am not sure if the precedence is decided because the file names can vary by device. I have a Forerunner 55 if it helps.

I am also not sure if changing this order would be having other unintended consequences. However, if this seems appropriate, more than happy to submit a PR for this.

Also, thank you for this project. It is super useful!

tcgoetz commented 11 months ago

Are these activity files that are downloaded from Garmin Connect? If so, were they uploaded by the device or manually? Were they downloaded with GarminDb?

bohrium272 commented 11 months ago

These are FIT files offloaded from the device itself. Garmin Connect was not involved.

On Sun, 24 Sep 2023 at 18:11, Tom Goetz @.***> wrote:

Are these activity files that are downloaded from Garmin Connect? If so, were they uploaded by the device or manually? Were they downloaded with GarminDb?

— Reply to this email directly, view it on GitHub https://github.com/tcgoetz/GarminDB/issues/194#issuecomment-1732562145, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFB4C4XTURLYADPAAMC6FLX4ATABANCNFSM6AAAAAA43X2ZAY . You are receiving this because you authored the thread.Message ID: @.***>

tcgoetz commented 11 months ago

GarminDb formats files it downloads as activity< activity id >.fit and that is the root of the problem. When you copy them off, I think you need to name them as activity< integer >.fit

tcgoetz commented 11 months ago

IF you're using GarminDb to copy the files from the device, then another option is to implement the file renaming there. The date time from the filename could be converted to an epoch and that could be used as the activity id.

bohrium272 commented 11 months ago

Ah ok, makes sense. If this can apply to all files copied from devices, can I submit a patch for this?

tcgoetz commented 11 months ago

Yes, please submit a pull request against the develop branch. The changes should go here: https://github.com/tcgoetz/GarminDB/blob/0eed598c4c0a62c2c90e13e529aedd5b2c24ae5f/garmindb/copy.py#L39 You should be able to compare the source filename against the pattern and construct a new filename to copy to. Activity files should be named like this 11774301764_ACTIVITY.fit where the integer part is unique and can be derived by translating the date and time in the old filename to the epoch. Monitoring file names have a couple possibilities: 212890593134_WELLNESS.fit, 212939263331_HRV_STATUS.fit, 212939267756_SLEEP_DATA.fit, 213026391314_METRICS.fit

bohrium272 commented 11 months ago

The pull request is ready for review. Though I have only changed the naming for the activities files since that seemed the only type of file to be causing the issue. Monitoring files don't seem to have timestamps or dates in their name.

➜  ~ ls HealthData/FitFiles/Monitoring/2023 | head -10 
M1100000.FIT
M1200000.FIT
M12L3900.FIT
M12L5939.FIT
M1300000.FIT
M1400000.FIT
M1500000.FIT
M1600000.FIT
M1700000.FIT
M1800000.FIT
tcgoetz commented 11 months ago

I have some comments on the PR.

bohrium272 commented 10 months ago

@tcgoetz sorry, I don't see any comments on the PR yet :thinking:

tcgoetz commented 10 months ago

When I follow the PR link above, I see this Screenshot

bohrium272 commented 10 months ago

Interesting, I wonder if the yellow colored "Pending" label means there is some action required. Apparently there is a StackOverflow answer expecting this issue. Not the best UX from GitHub.

tcgoetz commented 10 months ago

Can you see the review now?

bohrium272 commented 10 months ago

I do now, thanks!

tcgoetz commented 10 months ago

merged