tkrajina / gpxpy

gpx-py is a python GPX parser. GPX (GPS eXchange Format) is an XML based file format for GPS tracks.
Apache License 2.0
987 stars 223 forks source link

e.g.: point.extensions.get("hr") #119

Open jedie opened 6 years ago

jedie commented 6 years ago

It would be cool to easier get values from gpx extensions.

e.g.:

      <trkpt lat="51.43788929097354412078857421875" lon="6.617012657225131988525390625">
        <ele>23.6000003814697265625</ele>
        <time>2018-02-21T14:30:50.000Z</time>
        <extensions>
          <ns3:TrackPointExtension>
            <ns3:hr>125</ns3:hr>
            <ns3:cad>75</ns3:cad>
          </ns3:TrackPointExtension>
        </extensions>
      </trkpt>
>>> point.extensions.get("hr")
'125'
>>> point.extensions.get("cad")
'75'

Don't know if it possible to get integers here. Is somewhere the information about the extension types?!?

jedie commented 6 years ago

I have now made this:

def get_extension_data(gpxpy_instance):
    """
    return a dict with all extension values from all track points.
    """
    extension_data = collections.defaultdict(list)

    for track in gpxpy_instance.tracks:
        for segment in track.segments:
            for point in segment.points:
                extensions = point.extensions
                if not extensions:
                    return None

                for child in extensions[0].getchildren():
                    tag = child.tag.rsplit("}", 1)[-1] # FIXME

                    value = child.text
                    try:
                        if "." in value:
                            value = float(value)
                        else:
                            value = int(value)
                    except ValueError:
                        pass
                    extension_data[tag].append(value)

    return extension_data

Any idea how to make this better? How to get easier the "name" of the extensions?

tkrajina commented 6 years ago

Keep in mind that an extension can theoretically contain multiple extensions and each can be any kind of xml subtree, for example:

    <extensions>
      <ns3:Ext attr="bbb">
        <ns3:hr>125</ns3:hr>
        <ns3:hr>125</ns3:hr>
        <ns3:hr>125</ns3:hr>
        <ns3:hr>125</ns3:hr>
        <ns3:hr>125</ns3:hr>
        <ns3:cad><ns3:bbb>75</ns3:bbb></ns3:cad>
      </ns3:Ext>
    </extensions>

And now you need a simple and easy way to get the attr attribute, the hr values, and cad->bbb value.

jedie commented 6 years ago

Any idea how to make a simple to use API ?

tkrajina commented 6 years ago

Well, no, not yet :) But, now that you asked, here are a couple of ideas:

Maybe something like:

points.extensions.get("TrackPointExtension", "hr") # returns a string
points.extensions.get_float("TrackPointExtension", "hr") # returns a number

Or, let's suppose there san be multiple hr tags:

points.extensions.get("TrackPointExtension", "hr[2]")

...and hr would just be an alias for hr[0].

Or maybe:

points.extensions.get("TrackPointExtension", "hr").string()
points.extensions.get("TrackPointExtension", "hr").number()
# in case of multiple "hr" elements, get the fourth one:
points.extensions.get("TrackPointExtension", "hr", 3).number()
# set a value:
points.extensions.get("TrackPointExtension", "hr").set(100)
jedie commented 6 years ago

points.extensions.get("TrackPointExtension", "hr[2]")

This looks ugly ;)

points.extensions.get("TrackPointExtension", "hr").string()
points.extensions.get("TrackPointExtension", "hr").number()
# in case of multiple "hr" elements, get the fourth one:
points.extensions.get("TrackPointExtension", "hr", 3).number()
# set a value:
points.extensions.get("TrackPointExtension", "hr").set(100)

This looks ok... Maybe "number" -> "float" ?!?

Because there can be multiple entries: get("TrackPointExtension", "hr") is a "shortcut" for: get("TrackPointExtension", "hr", 0) isn't it?

tkrajina commented 6 years ago

Yes, I agree (including the "ugly" remark ;) ). Also, the API should allow for a way to retrieve attributes. Something like this:

points.extensions.get("ExtensionName", "tagName", "#attribute")

Or maybe:

points.extensions.getFloat("ExtensionName", "tagName", "#attribute")
points.extensions.getString("ExtensionName", "tagName", "#attribute")
points.extensions.get("ExtensionName", "tagName", "#attribute") #returns the DOM element
jedie commented 5 years ago
points.extensions.get_float("ExtensionName", "tagName", "#attribute")
points.extensions.get_string("ExtensionName", "tagName", "#attribute")
points.extensions.get("ExtensionName", "tagName", "#attribute") #returns the DOM element

;)

pwolfram commented 5 years ago

Just curious if there is a solution here-- gpx from Strava have hr, cadance, etc data as extensions like this:

    <extensions>
     <gpxtpx:TrackPointExtension>          
      <gpxtpx:hr>80</gpxtpx:hr>
      <gpxtpx:cad>0</gpxtpx:cad>
     </gpxtpx:TrackPointExtension>
    </extensions>

However, as far as I can tell, I don't think this data is getting brought into the extensions attribute of points. Given my understanding of the scope here this may be a bug. Any recommendation or advice on how to get the extensions data in practice is greatly appreciated.

andyreagan commented 3 years ago

Here's a working version from Strava, @pwolfram. It's not the prettiest, but it works.

import pandas as pd
import gpxpy
import lxml
from pathlib import Path

def df_from_segment(segment) -> pd.DataFrame:
    seg_list = []

    for point in segment.points:
        base_data = {
            'timestamp': point.time,
            'latitude': point.latitude,
            'longitude': point.longitude,
            'elevation': point.elevation,
            'speed': point.speed
        }
        extension_data = {
            lxml.etree.QName(child).localname: sloppy_float(child.text)
            for child in point.extensions[0]
        }
        for k, v in extension_data.items():
            base_data[k] = v
        seg_list.append(base_data)
    return pd.DataFrame(seg_list)

def df_from_track(track) -> pd.DataFrame:
    return pd.concat([df_from_segment(segment) for segment in track.segments])

def df_from_gpx(gpx):
    return pd.concat([df_from_track(track) for track in gpx.tracks])

gpxfile = gpxpy.parse(Path("stravafile.gpx").read_text())
gpxfile_df = df_from_gpx(gpxfile)
astrowonk commented 2 years ago

@andyreagan @pwolfram Would love to knov if your strava files convert with hr properly with my gpxcsv converter (which while it makes csv, can also easily make a list of dicts for a dataframe.) It works well on the hr and other extension data in Apple Watch exported gpx files I have tried, but I haven't used strava. You'd just:

import pandas as pd
from gpxcsv import gpxtolist

df = pd.DataFrame(gpxtolist('myfile.gpx'))
andyreagan commented 2 years ago

@astrowonk confirmed, this works perfectly!