ryangriggs / GoogleTimelineMapper

Map and Browse Google Timeline Semantic Location History data
Apache License 2.0
14 stars 1 forks source link

support the new json schema #8

Open dmd opened 1 month ago

dmd commented 1 month ago

In the new on-device-only Location History, Google has changed the JSON schema.

ryangriggs commented 1 month ago

Does the app fail to work with the new schema?

dmd commented 1 month ago

Yes, it refuses to even load it.

"Invalid data: must contain 'timelineObjects' array."

dmd commented 1 month ago

(I'm a little surprised that you wrote this just last week, given Google Timeline is being sunsetted literally as we speak in favor of the new on-device version - were you unaware?)

ryangriggs commented 1 month ago

Wrote the app a few months ago, another user pitched in last week to add features.

How did you obtain your location history data? Did you use Takeout or another method?

dmd commented 1 month ago

No, Takeout is no longer supported for Location History. You have to export it from the app on your phone now.

https://support.google.com/maps/answer/14169818?co=GENIE.Platform%3DDesktop&oco=1

https://www.androidauthority.com/google-maps-killing-timeline-web-access-3449017/

https://gizmodo.com/google-maps-timeline-app-browser-1851520501

IMG_7829

dmd commented 1 month ago

The new schema appears to be:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Generated schema for Root",
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "endTime": {
        "type": "string"
      },
      "startTime": {
        "type": "string"
      },
      "visit": {
        "type": "object",
        "properties": {
          "hierarchyLevel": {
            "type": "string"
          },
          "topCandidate": {
            "type": "object",
            "properties": {
              "probability": {
                "type": "string"
              },
              "semanticType": {
                "type": "string"
              },
              "placeID": {
                "type": "string"
              },
              "placeLocation": {
                "type": "string"
              }
            },
            "required": [
              "probability",
              "semanticType",
              "placeID",
              "placeLocation"
            ]
          },
          "probability": {
            "type": "string"
          }
        },
        "required": [
          "hierarchyLevel",
          "topCandidate",
          "probability"
        ]
      },
      "activity": {
        "type": "object",
        "properties": {
          "probability": {
            "type": "string"
          },
          "end": {
            "type": "string"
          },
          "topCandidate": {
            "type": "object",
            "properties": {
              "type": {
                "type": "string"
              },
              "probability": {
                "type": "string"
              }
            },
            "required": [
              "type",
              "probability"
            ]
          },
          "distanceMeters": {
            "type": "string"
          },
          "start": {
            "type": "string"
          }
        },
        "required": [
          "end",
          "topCandidate",
          "distanceMeters",
          "start"
        ]
      },
      "timelinePath": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "point": {
              "type": "string"
            },
            "durationMinutesOffsetFromStartTime": {
              "type": "string"
            }
          },
          "required": [
            "point",
            "durationMinutesOffsetFromStartTime"
          ]
        }
      },
      "timelineMemory": {
        "type": "object",
        "properties": {
          "destinations": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "identifier": {
                  "type": "string"
                }
              },
              "required": [
                "identifier"
              ]
            }
          },
          "distanceFromOriginKms": {
            "type": "string"
          }
        },
        "required": [
          "distanceFromOriginKms"
        ]
      }
    },
    "required": [
      "endTime",
      "startTime"
    ]
  }
}

which is to say, total garbage - it's an array of lots of different kinds of objects.

ryangriggs commented 1 month ago

Odd, I don't have that feature on my device yet... it still takes me directly to Takeout to download the location history. I guess they haven't rolled out the update to my account yet.

Do you have any links describing the schema of the new export format? A cursory search yielded nothing useful. I don't have much time today to look into this.

dmd commented 1 month ago

Also, just emailed you.

dmd commented 1 month ago

I dug into this a bit more. It's not as bad as all that. There are four types of objects - visit, activity, timelineMemory, and timelinePath. As best as I can tell given my own data - and I have continuous data since 2010, so it should be pretty representative - the first three of those can be ignored if all you're interested in is a list of places you've been.

I wrote this:

import json
import csv
import argparse
from datetime import datetime, timedelta

def parse_iso8601(timestamp):
    return datetime.fromisoformat(timestamp.replace("Z", "+00:00"))

def extract_lat_lon(location):
    return location.split(":")[1].split(",")

def main(input_file, output_file):
    with open(input_file, "r") as f:
        data = json.load(f)

    with open(output_file, "w", newline="") as csvfile:
        fieldnames = ["timestamp", "latitude", "longitude"]
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()

        for item in data:
            if "timelinePath" in item:
                path_points = item["timelinePath"]
                start_time = parse_iso8601(item["startTime"])
                for point in path_points:
                    offset = timedelta(
                        minutes=int(point["durationMinutesOffsetFromStartTime"])
                    )
                    timestamp = (start_time + offset).isoformat()
                    lat, lon = extract_lat_lon(point["point"])
                    writer.writerow(
                        {"timestamp": timestamp, "latitude": lat, "longitude": lon}
                    )

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Convert JSON to CSV.")
    parser.add_argument("input_file", help="Input JSON file")
    parser.add_argument("output_file", help="Output CSV file")
    args = parser.parse_args()

    main(args.input_file, args.output_file)

which appears to correctly parse the data into something that e.g. kepler.gl can read. Basically, for each object in the top level JSON array, process it only if it is a timelinePath object (it contains that key). Get the startTime, then for each item in the timelinePath array, take its latitude and longitude, and pair it with the startTime + the durationMinutesOffsetFromStartTime.

I tried doing it that way vs. also processing visits and activities and ended up with almost identical results, so probably this is the way to go. (Note that if you do go the way of also processing visits and activities, you must ignore visits that have a semanticType of "Searched Address".)

timelineMemory can always be ignored.

dmd commented 1 month ago

I should also say that, unfortunately, the exported data from the app is incomplete. It's pretty good but it's only about 20% of the data points. For example:

image

versus

image
ryangriggs commented 1 month ago

Hi Dr. Drucker, thanks for this info. Are you interested in adding a PR to support the new location data format? Not sure what you mean by exported data is incomplete. Do you mean my app doesn't export all the locations that you import from the file?

Thanks for your input on this issue.


Ryan Griggs Hilltop Computing www.hilltop.net 859-328-3223 Toll Free: 1 (888) 5-HILLTOP (888-544-5586)

On Wed, Jun 12, 2024 at 10:39 AM Daniel M. Drucker, Ph.D. < @.***> wrote:

I should also say that, unfortunately, the exported data from the app is incomplete. It's pretty good but it's only about 20% of the data points. For example: image.png (view on web) https://github.com/ryangriggs/GoogleTimelineMapper/assets/41439/b5573670-d076-4d3a-a4b7-c7048521ea51

versus image.png (view on web) https://github.com/ryangriggs/GoogleTimelineMapper/assets/41439/8a720527-e9fc-4a86-84b2-96d4dcfdee15

— Reply to this email directly, view it on GitHub https://github.com/ryangriggs/GoogleTimelineMapper/issues/8#issuecomment-2163188635, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMVKGLVUKRFMFLHKSO4QMDZHBMR7AVCNFSM6AAAAABJGIH2ZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRTGE4DQNRTGU . You are receiving this because you commented.Message ID: @.***>

dmd commented 1 month ago

Yeah, gimme a day and I'll PR.

No, that's a complaint about Google, not you. The export from the iOS google maps app doesn't contain the same amount of detail as the old Google Takeout data.

dmd commented 1 month ago

Actually, I don't know how you'd want this to work. Your app wants arrived, departed, duration, lat, lon. Using the timelinePath objects (which appear to contain the bulk of the data) we only really get a timestamp, lat, and lon.

MueJosh commented 1 month ago

Actually, I don't know how you'd want this to work. Your app wants arrived, departed, duration, lat, lon. Using the timelinePath objects (which appear to contain the bulk of the data) we only really get a timestamp, lat, and lon.

I can have a look at it in a few days, for now I'm working on a different feature (import GPX and kml and some other stuff).

assc1967 commented 1 week ago

Hi to all, I was trying to use your script for a project but like some other people found before the json downloaded from the device is probably different from the schema your script seems to be expecting Maybe someone could give me some hint about how to find the problem and get the script working back? :) Thank you

MueJosh commented 1 week ago

Hi to all,

I was trying to use your script for a project but like some other people found before the json downloaded from the device is probably different from the schema your script seems to be expecting

Maybe someone could give me some hint about how to find the problem and get the script working back? :)

Thank you

I downloaded my google timeline again and it still looks exactly the same as before...

As long as I don't have any files to test the programme, I can't change anything in the code. If someone can give me a source as an example (real data, not just the schema), I can try to adapt the code accordingly.

image image

ve3 commented 1 week ago

I can export from Google takeout. My time line is no longer work on browser on Google website. It is working on mobile app only. When I exported time line from takeout and select import from this repo's index file, none of them are working. There is the same error on OP. Timeline Edits.json => Invalid data: must contain 'timelineObjects' array. Tombstones.csv => Invalid data. The data must be in JSON format. Encrypted Backups.txt => Invalid data. The data must be in JSON format. Settings.json => Invalid data: must contain 'timelineObjects' array.

ryangriggs commented 1 week ago

We need some sample data in order to modify the parser. Can anyone supply a sample export? (no private data please!)

ve3 commented 1 week ago

We need some sample data in order to modify the parser. Can anyone supply a sample export? (no private data please!)

{
  "timelineEdits": [{
    "deviceId": "987654321",
    "rawSignal": {
      "signal": {
        "activityRecord": {
          "detectedActivities": [{
            "activityType": "STILL",
            "probability": 1.0
          }],
          "timestamp": "2024-07-06T06:53:29.710Z"
        }
      }
    }
  }, {
    "deviceId": "987654321",
    "rawSignal": {
      "signal": {
        "position": {
          "point": {
            "latE7": 137628680,
            "lngE7": 1006454630
          },
          "accuracyMm": 14063,
          "altitudeMeters": -23.0,
          "source": "WIFI",
          "timestamp": "2024-07-06T05:23:01.340Z",
          "speedMetersPerSecond": 0.0
        }
      },
      "additionalTimestamp": "2024-07-06T05:23:03.532Z"
    }
  }, {
    "deviceId": "987654321",
    "rawSignal": {
      "signal": {
        "wifiScan": {
          "deliveryTime": "2024-07-06T05:23:01.340Z",
          "devices": [{
            "mac": "123456789012345",
            "rawRssi": -43
          }, {
            "mac": "123456789012345",
            "rawRssi": -56
          }, {
            "mac": "123456789012345",
            "rawRssi": -62
          }, {
            "mac": "123456789012345",
            "rawRssi": -65
          }, {
            "mac": "123456789012345",
            "rawRssi": -72
          }, {
            "mac": "123456789012345",
            "rawRssi": -88
          }, {
            "mac": "123456789012345",
            "rawRssi": -92
          }]
        }
      },
      "additionalTimestamp": "2024-07-06T05:23:01.340Z"
    }
  }
  , ....and a lot more
  ]
}

lat is 13.nnn, long is 100.nnn wifiScan.devices.mac are all fake.

Sorry but file size is too big and I can't see the bottom that what it contains.

ve3 commented 1 week ago

Update: I see that new Google time line use new way to export and here is instruction.

However, I'm currently stuck with their export failed 1, 2.