sports-alliance / sports-lib

A Library for processing GPX, TCX, FIT and JSON files from services such as Strava, Movescount, Garmin, Polar etc
GNU Affero General Public License v3.0
156 stars 21 forks source link

chore: add duration for stream data for activities getter #37

Closed jimmykane closed 4 years ago

jimmykane commented 4 years ago

Also makes offset not required

However this will wait as I need to add tests. Just starting the PR.

If one wants to continue with integration tests/or integration testing we could merge this and continue adding specs later on.

thomaschampagne commented 4 years ago

Seen. I pushed feature/robustness-via-integrations-tests which improves previous integration tests (no PR yet) i can add tests for you within it.

jimmykane commented 4 years ago

Yup that would be good @thomaschampagne . I ll add some unit tests also for the sanity of the inner calls

The more we can cover the better.

codecov-io commented 4 years ago

Codecov Report

Merging #37 into develop will increase coverage by 0.11%. The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #37      +/-   ##
===========================================
+ Coverage     64.6%   64.71%   +0.11%     
===========================================
  Files          169      172       +3     
  Lines         4147     4194      +47     
  Branches       628      636       +8     
===========================================
+ Hits          2679     2714      +35     
- Misses        1460     1472      +12     
  Partials         8        8
Impacted Files Coverage Δ
src/data/data.interface.ts 100% <ø> (ø) :arrow_up:
src/data/data.end-position.ts 100% <100%> (ø)
src/data/data.start-position.ts 100% <100%> (ø)
src/data/data.store.ts 48.95% <100%> (+0.53%) :arrow_up:
src/data/data.ts 69.69% <41.66%> (-2.72%) :arrow_down:
src/events/utilities/event.utilities.ts 63.59% <55%> (-0.07%) :arrow_down:
src/data/data.position.ts 78.57% <78.57%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 091bdf4...99562c9. Read the comment docs.

thomaschampagne commented 4 years ago

FYI Time streams for GPX & TCX: time_stream_patches.zip

You decide what you do you them ;) If patches are applied. Apply patch 1 then 2.

jimmykane commented 4 years ago

Ok I think I understand a bit what you want.

You want also a time based stream (array) ?

Why would you need that since it can be generated?

(if I understand correctly).

I want to help here...

Could you show me the structure / a sample of data as you need them?

thomaschampagne commented 4 years ago

Yes exactly! I need a a time based stream as array.

I need this because i use time as array often in my calculations: https://github.com/thomaschampagne/elevate/blob/develop/plugin/app/modules/shared/models/activity-data/activity-streams.model.ts#L5

I stick to the Strava API pattern which return like you distance, speed, ... streams https://developers.strava.com/docs/reference/#api-models-StreamSet ... And time stream :)

It's seems also better to have it alone to use it along distance or speed or other data (through index). Instead of asking time for every type of stream

jimmykane commented 4 years ago

I hear you, I ll check those in this weekend. Off to baby and sleep now.

Could you paste a stream of altitude and a stream of time data here ?

A few aligned samples (eg 10s from start) could do it.

Unfortunately the strava docs don't show a some data (or I am doing it wrong)

thomaschampagne commented 4 years ago

Thanks for this but i can do the work you know... We can split it at least... Just let me know ;)

Altitude stream, sample 20 to 120:

[112.2,112,111.8,111.8,111.2,111,111.4,111.6,110.8,110.2,110.8,111.2,111.4,111,111,111,111,111,111,111.4,111,110.8,110.8,111,111,111.2,111,111.2,111.2,110.6,111.2,110.8,111.2,111.2,111.4,111.2,111.2,111,111.2,111.2,110.8,110.4,110.6,111,111.4,111.4,111.4,111,111.4,111.2,111,111.2,111.2,111.2,111.4,111.6,111.6,112,112,111.8,111.8,110.8,110.2,111.4,111.8,112,112,112,112,112,112.4,112.2,112.2,112.2,112,112.6,112.4,112.4,112,112.2,112.2,112.4,112.4,112.2,112.2,112.2,112,112.2,112,112,112.2,112.2,112.6,112.6,112.8,112.8,112.6,112.8,113.2,113.4]

And time stream , sample 20 to 120:

[43,50,53,55,59,61,63,65,68,73,76,78,81,84,86,91,93,95,99,104,109,114,118,120,123,125,131,133,136,142,148,149,157,159,162,169,176,177,184,188,195,197,204,206,209,210,214,220,225,230,235,237,239,242,244,249,250,254,255,261,268,270,272,277,279,281,287,292,299,307,314,321,329,330,335,340,346,353,356,358,359,361,366,370,374,378,385,386,393,394,401,402,409,410,415,416,421,422,430,431]

Activity source is https://www.strava.com/activities/2749884297

Streams for this activity are here

And take rest please ;)

jimmykane commented 4 years ago

Alright that looks quite doable @thomaschampagne .

FYI you dont need to import that stream (time data). The lib already holds that info. I ll write a little example and try to update this branch over the weekend.

jimmykane commented 4 years ago

I think I kinda understand this but I have a crucial question and perhaps would like to debate a bit about , just to hear your thoughts.

On TCX , GPX file format we have a point in Time which is the parent lets say and underneath it there are data eg distance , altitude etc.

For FIT files it's a bit different but pretty much the same concept it's a record and has time, distance , altitude etc.

So if I request from Strava 2 streams, distance and time for lets say an activity of 10s I get back their corresponding arrays of data and both arrays are of length of 10.

So far so good.

But what happens when there are data missing. Lets say distance, HR (belt etc disconnected and so on). From what I see Strava when distance is missing, it interpolates linear meaning that those values are not real values as you don't know if the user traveled the distance linear.

QSLib will output null for those values it doesn't know. In regards to that is basically my question. What should be the best way for you to consume that data.

Here is an example.

DATA: https://www.strava.com/activities/3021914844/streams?stream_types%5B%5D=time&stream_types%5B%5D=heartrate ACTIVITY: https://www.strava.com/activities/3021914844

On this streatch I took the belt off about 8s before I finished the activity.

All other services show correct no HR at the end

Example on my own service https://quantified-self.io/user/ZmR6FFReiNeQl0HaeZ9mlhsppkE2/event/3cPQA91MbTn2CvY9nttYkTaPHUK4SjYMT41sdnvBG9EcbsF

But strava data repeat the last value for the end of the activity and do the same for all intermediate missing values, efficiently kinda screwing average calculations as well.

Any thoughts :-D ?

thomaschampagne commented 4 years ago

I think you are doing it right by filling with null values.

In my opinion i prefer to not have data instead of fake data.

I can't open activity https://www.strava.com/activities/3021914844, maybe private?

jimmykane commented 4 years ago

Oups it was private :-D I dont put stretching and small runs and walks to public not to spam people hehe.

Ok now its public.

Alright I think I agree with you about null. It also btw creates smaller file sizes especially if you are compressing the stream data with GZip etc. (On a side note if you do this I got an excellent lib for you) .

Alright I ll put some work to this. Then I thin I kinda come to the last few question:

I ll give it some thought as well.

thomaschampagne commented 4 years ago

Yes i compress streams before saving them into database. I'm using the pako lib for this.

Any preference for the API? Should the activity generate/contain a time stream? eg activity.getStream(DataTime.type)[12] or something like a tuple or object that always contains the time?

I'm not sure to understand but activity.getStream(DataTime.type) returning the number[] is perfect :D

jimmykane commented 4 years ago

Aha! So you should try https://github.com/drbh/wasm-flate just for the decompression, for compression my tests where not great if the size was not big. For decompression YES! About 2-3x faster. (And they both play well). Actually the API of wasm is more straightforward.

See my times https://twitter.com/JimmyKane9/status/1210936386561413121

I find it funny I am doing exactly the same on those "streams" with PAKO etc and it was a bit of a hell for me to pass though many libs and stabilize it.

Alright its clear what you want :-D

jimmykane commented 4 years ago

Ok last question. been a bit thinking about this and I am a tiny bit sceptic about the use case.

Could you shared how you consume lets say an activity with altitude and HR.

I am asking this because I am trying to understand why you need time as a separate stream, meaning since that can be assumed easily on this lib.

For example:

An activity of 12 seconds will always have streams of stream.data.length === 12 So lets say if an activity has 12s recorded as total time and lets say missed 3 seconds of HR the HR stream would look like this.

[60, 80, 82, null, null, 88, 90, 120, null, 120, 126, 130]

Now index 0 = second 0-1s (efficiently you can assume 1s for index 0 So

index 0 = 1s 
index 1 = 2s 

Or like this

[1,2,3,null, null, 6,7,8, null,10,11,12]

But here is the catch:

The nulls up there are due to the HR missing.... But then what about if altitude data where not missing.

I am a bit sceptic confused how time is used....

thomaschampagne commented 4 years ago

I understand your concerns about this.

Indeed i always have the data in my calculation because my source is strava streams (number[] whitout nullvalues)

You can find in the below file how i use time as an array everywhere (seek for timeArray pattern):

https://github.com/thomaschampagne/elevate/blob/develop/plugin/core/scripts/processors/activity-computer.ts

jimmykane commented 4 years ago

Alright I get what you are doing there. You iterate over the data and you read the time array which is 1:1 to the eg HeartRate,Cadence etc stream right?

thomaschampagne commented 4 years ago

Yes exactly :)

jimmykane commented 4 years ago

I pushed some latest changes @thomaschampagne that should do what you need.

Once you have a stream of data then you can:

const stream = new Stream(DataAltitude.type, [200, null, 502, Infinity, -Infinity, NaN, 0]);
expect(stream.getData()).toEqual([200, null, 502, Infinity, -Infinity, NaN, 0]);
expect(stream.getData(true)).toEqual([200, 502, Infinity, -Infinity, 0]);
expect(stream.getData(false, true)).toEqual([200, null, 502, NaN, 0]);
expect(stream.getData(true, true)).toEqual([200, 502, 0]);

Now equivalently you can ask for the time of that stream

const stream = new Stream(DataAltitude.type, [200, null, 502, Infinity, -Infinity, NaN, 0]);
expect(stream.getDurationOfData()).toEqual([0, 1, 2, 3, 4, 5, 6]);
expect(stream.getDurationOfData(true)).toEqual([0, 2, 3, 4, 6]);
expect(stream.getDurationOfData(false, true)).toEqual([0, 1, 2, 5, 6]);
expect(stream.getDurationOfData(true, true)).toEqual([0, 2, 6]);

Now, why per stream and why not per activity.

I think for 2 reasons:

Also an observation. I think the reason why Strava removes some seconds is due to "moving" time. Looking at my indoors activities (like the one above) it does not miss any second.

We can if you like debate and see how something can be done with this lib but lets start simple and see if this PR works this out for you.

I added some info on the stream interface (docs) btw as well.

Keep me posted :-D

P.S. There is a "breaking" change. stream.data property is no longer public. That is to prefer the getter with filtering and harmonize the calls

thomaschampagne commented 4 years ago

Seems to be perfect :D. Very thanks !! I will checkout, test, and review the whole PR between tomorrow and friday ;)

jimmykane commented 4 years ago

Btw to add some info I have learned over files , charts etc.

First of all to declare that I am using amcharts on another project that consumes this lib.

Filtering out nonNumeric values via onlyNumeric helps for things like latitude and HR etc especially if you want Max, Min etc.

Now, methematicaly if the speed is 0 the pace is infinity. Right? Then it's legit I think for filterInfinity to be there. You can represent it differently , for example if it was realtime, you would want to hide that value or show -- , something like that. Also chart libs might not like that.