ib00 commented 2 months ago

What algorithm do you use to get power for a given ride?

Say you have power data or a 4hr ride sampled at 1Hz, do you just compute histogram of power and then use this to update your power duration curve?

Is it a good idea to smooth power first (say rolling average over 3-5 seconds)?

hbuhrmann commented 2 months ago

Gosh, I have done this so long ago I can't actually remember what I did (or what the code does)

I am pretty sure I did not just compute a histogram of power, as you have to measure and compare average power over a given time interval to get to a power curve. For instance, to work out your max 60 second power, you have to traverse the timeline of your ride and work out the average power for every single minute. In theory, you should start at second 0, then calculate the average for the first 60 seconds, then move to second 1, and calculate the average for seconds 2 to 61, then move on another second, and so on until you get to 60 seconds from the end. This should negate the need for smoothing out your power especially over longer time intervals.

Now obviously doing what I describe here is going to take you forever, so I must have done something to optimize the algorithm, but what it is I honestly cannot remember

If memory serves, I started out by calculating power using weight and cDA + distance + elevation between datapoints to calculate power output for a given timeframe, with some dodgy smoothing applied. Later i did start to gather data from my Garmin but I honestly cannot remember what I did with this. This is just datapoints, so it shouldn't affect the underlying algorithm.

But please take the above with a pinch of salt, as I said, this is going very far back in time, and I haven't touched this code for years.....

ib00 commented 2 months ago

Thanks for your answer.

I haven't seen a good solid description of how Strava/Training Peaks extract power duration from each training ride. What you suggest seems reasonable where you would compute best power over some duration. The question is how many of these durations do you need (5s, 10s, 20s, 30s, 1', 2', etc.).

The second question is how do you exclude erroneous samples? If you have wrong data (as it often happens), it can easily throw off these averages. That's why some sort of smoothing or outlier detection seem to be appropriate.

hbuhrmann commented 2 months ago

Stuff is beginning to come back to me now . . . the algorithm is something like this:

Lets say you want to find out the 30-second interval with the highest average power over a 4 hour ride

Calculate the average power for every 30-second interval (it shouldn't be too intensive). Then identify the interval with the highest average and look at the intervals either side. Lets call them iLeft, iMiddle and iRight. If iRight and iLeft has the same value, then iMiddle is as high as it will get. if iLeft is higher than iRight, then move iMiddle left by one second at a time, until you find the highest average. If iRight is higher than iLeft, then send iMiddel to the right

With regards to the second question, I don't really know, but there has to be some statistical methods that you can use to identify outliers by drawing a curve or a moving average and identifying any point that deviates from where it is expected to be on the curve. I guess the big question is whether the data will have outlier points, or whether it will have a series of points as outliers. Dealing with outlier points probably easier than dealing with a continuous series of outliner.

I asked ChatGPT:

If i have a series of points with watt value (representing the power output of my bicycle power meter, measured once per second on the left crank), how would i identify outliers

And this is what it came back with:

To identify outliers in a series of watt values, you can use statistical methods such as the Z-score or the Interquartile Range (IQR). Below is a Python example using both methods to detect outliers in your watt data.

Method 1: Using Z-score The Z-score method detects outliers by determining how many standard deviations a data point is from the mean.

import numpy as np import pandas as pd

Example data

data = pd.Series([250, 260, 245, 300, 280, 500, 240, 255, 260, 1000])

def detect_outliers_z_score(data, threshold=3): mean = np.mean(data) std = np.std(data) z_scores = [(x - mean) / std for x in data] outliers = np.where(np.abs(z_scores) > threshold) return outliers[0]

outliers = detect_outliers_z_score(data) print("Outliers detected at indices:", outliers) print("Outlier values:", data.iloc[outliers].values)

Method 2: Using Interquartile Range (IQR) The IQR method detects outliers by finding values that fall below Q1 - 1.5IQR or above Q3 + 1.5IQR.

def detect_outliers_iqr(data): q1 = np.percentile(data, 25) q3 = np.percentile(data, 75) iqr = q3 - q1 lower_bound = q1 - 1.5 iqr upper_bound = q3 + 1.5 iqr outliers = np.where((data < lower_bound) | (data > upper_bound)) return outliers[0]

outliers = detect_outliers_iqr(data) print("Outliers detected at indices:", outliers) print("Outlier values:", data.iloc[outliers].values)

Explanation Z-score method: Identifies data points that are a certain number of standard deviations away from the mean (typically 3 standard deviations). IQR method: Identifies data points that are outside 1.5 times the interquartile range below the first quartile (Q1) or above the third quartile (Q3). You can choose either method based on your preference or apply both to cross-verify the outliers.

================================================================================

hbuhrmann commented 2 months ago

Actually reading ChatGPT's response, it misses the point - it assumes that power output should be consistent, which it decidedly isn't!!

ib00 commented 2 months ago

Thank you very much for your reply. I really appreciate it. It gives me a starting point.

I'll start playing with different things that you suggested here and see what I can come up with.

hbuhrmann / readcycledata

Power for a given ride #1

And this is what it came back with:

Example data