laberning / openrowingmonitor

A free and open source performance monitor for rowing machines
https://laberning.github.io/openrowingmonitor
GNU General Public License v3.0
98 stars 19 forks source link

Incorrect drag factor calculation formula #68

Closed Abasz closed 1 year ago

Abasz commented 2 years ago

Hi, I was reviewing your code (its pretty great work, and even though for me typescript would be more readable, still quite expressive and easy to follow) as I am implementing something similar for a microcontroller (ESP32) to provide proper bluetooth capabilities for an air rower.

Anyway, while I was reviewing which moment (currentDt) is used in the formula for drag factor calculation I noticed that there may be a (undesired) systematic error in the used currentDt for angularVelocity and recoveryPhaseLength amount. I understand that systematic errors can be compensated by adjusting constants like inertia but still I wanted to know your opinion (i.e. whether I am doing something wrong or in fact the systematic error is there in your calculation)

Below is one recovery cycle and my settings for the stroke detector:

    numOfImpulsesPerRevolution: 1,
    flankLength: 2,
    numberOfErrorsAllowed: 0,
    naturalDeceleration: 0,
    flywheelInertia: 0.073,
    autoAdjustDragFactor: true,

currentDts (in sec):

0,057730
0,056867
0,056248
0,055971-> recoveryStartAngularVelocity = 2 * PI / 0,055971
0,056108-> Change to recovery Phase 
0,056614
0,057417 
0,058093  
0,058758
0,059382
0,060113
0,060935
0,061576
0,06222
0,063026
0,063819
0,064538
0,065299
0,066043
0,066781
0,067684
0,068504
0,06915
0,070072
0,070842
0,071775
0,072473
0,073373 ->End of recovery Phase and  recoveryEndAngularVelocity = 2 * PI / 0,073373
0,07256
0,069196
0,065974

recoveryPhaseLength = 1.5272 (that is the sum of the currentDts between - including - 0,055971 and 0,072473)

So I believe the issue is that the recoveryPhaseLength is not using the right range of currentDts as it includes the last currentDt from the Drive and disregards the last currentDt of the Recovery (so recovery si systematically too short, in the contrary what you are stating in physics_openrowingmonitor). So actually the errors in drag factor, although systematic, would not be linear as the error in the recoveryPhaseLength would greatly depend on the difference between the the start currentDt and and end -1 currentDT (i.e. the longer the recovery phase, relatively the shorter the recoveryPhaseLength is in te equation, and greater the error in the increased drag factor). Due to this I am not sure if can be really compensated with setting the constants as on lower stroke rates the DF will be incorrectly high compared to higher stroke rate with applying the same amount of power.

Now I am not sure if I am missing something, I am most definitely not a physicist :) so it's well possible that I am missing something, but I think in the above case recoveryPhaseLength should be 1.544595 and drag factor is 130.9 instead of 132.4. Am I on a right track here or completely missing something? If you need further informatin (e.g. the full workout currentDts, or the drag factors calculated with the recoveryPhaseLength based on my proposed range etc.) please let me know.

Thanks for your help Abász

JaapvanEkris commented 2 years ago

Hi Abasz,

Thanks for the compliment, and great that you are porting the code to lighter hardware. The ESP32 is quite an interesting machine. Are you using Java?

The key to understanding this code is that for the flank, you have to decide what phase (i.e. drive or recovery) it belongs to. So when you start the recovery, the Finite State Machine of the handleRotationImpulse saw (in your case) two decelerating flanks and decided it was time for a recovery. As the recovery phase is typified by a decelerating flywheel, it should be included in the recovery phase.

The start of the drive is debatable if it is an error: The values at the beginning of the flank actually are pivot points: the speed itself doesn't trigger the next phase, but it is the flanks between these measuring points (i.e. the trend of in your case three successive points, with two flanks in between) that determines if the flywheel is decelererating or accelerating. The point can be made these currentDt's belong to both phases: they mark the exact boundary of the two successive phases.

The practical side is that stroke detection is a sloppy process, and that using better metrics will help, but timing does fluctuate as we are measuring an continous analog system with discrete intervals. Please note that the length of the phase is in fact irrelevant: in essence the code calculates the slope of a function (i.e. the deceleration of the flywheel speed as a function of recovery time). Typically, a longer recovery phase will also include another impulse, extending the function. In my new implementation, I actually use this approach and shorten the recovery phase for the drag calculation by only including datapoints that are certain to belong to the recovery phase.

Please note that there is a bug in the code: https://github.com/laberning/openrowingmonitor/issues/69.

The current approach certainly required improvement: each calculation in RowingEngine.js structurally relates to the beginning of the flank. However, totalTime doesn't (this is a bug, but a consistant one). In my new solution I moved the time and distance-keeping to MovingFlankDetector and the rest just asks what the absolute time and angular position is at the beginning of the flank.

Another improvement comes from the drag calculation itself. The current one isn't robust: it is quite easily distracted by outliers (i.e. measurement errors) if they come at the wrong moment. My new implementation is much less sensitive to outliers and behaves much better in tests (see https://github.com/JaapvanEkris/openrowingmonitor/blob/Sandbox/docs/Engine_Validation.md for the description of the algorithm and preliminary test results). These tests are against a Concept2 that delivers around 300 datapoints per stroke, so it is much less sensitive to outliers than your set-up anyway. I'm still testing it, as I want to validate the entire engine and all its metrics. Let's say I learned a lot along the way :). If you want, I can push some preliminary (not so clean) code to a sandbox so you can copy its ideas to your implementation.

Abasz commented 2 years ago

Thanks for your prompt response. Actually I use C++ in ESP32 and it is not an exact port. It cannot actually be, as ESP32 does not have the same processing power as a RPi. I have made different tradeoffs while implementing the currentDt filter and other filters (its much less robust, but in practice this is fine while there are less signals per turns –> more stable curves, but it would probably not work on non-air rowers) as these calculations are done in an Interrupt Service (ISR) that by design should be pretty quick and simple to avoid errors (this rule has been broken to some extent already😊). I use the concept of the finite state machine though.

Anyway, my point here is that any angularVelocity value is the actual a kind of average velocity at the end of that turn since the beginning of the previous turn. This should mean that if you are using the last currentDt from the Drive for the startAngularVelocity (that should make sense, although it may still contain some force, as described in your work on rowingPhysics here, that is a tradeoff and more signal per turn can reduce its effect) the recoveryPhaseLength should not include this time period (in the case of the example currentDt of 0,055971 should be added to the Drive) while if you are using the last point of the Recovery Phase for endAngularVelocity (currentDt 0, 073373) the last Recovery turn currentDt of 0,073373 should be included. I agree with your statement that “The point can be made these currentDt's belong to both phases: they mark the exact boundary of the two successive phases.”. Actually in my implementation I use the middle range of the Recovery phase (disregard first and last point as it may be “dirty” with power from the rower, where the amount of "dirt" in the measurement would decrease by the increase of the number of signals per turn) exactly to further clean the DF calculation from any potential power poisoning the calculation, but then I need to adjust the delta time for the formula (recoveryPhaseLength) since the deceleration period (i.e. the time between the two measured angularVelocities) does not equal to the recoveryPhaseLength. I agree with your statement that “the length of the recovery phase is irrelevant”, but only irrelevant as long as you are using the right “slope” (i.e. the time period between the used startAngularVelocity and endAngularVelocity). And this is what I think wrong in the current code: the used period is wrong (too short) as the currentDt of the endAngularVelocity is disregarded while the start is included in the delta time for DF calc. In my view startAngularVelocity should be excluded (since that is really the end average velocity at the end of the turn) and endAngularVelocity should be included.

Of course if my hypothesis on the “average velocity at the end of the turn” at the end and start AngularVelocity is wrong than that is a different matter😊

JaapvanEkris commented 2 years ago

Hi Abasz,

I agree that moving to the middle of the phase is often better, that is why the naturalDeceleration setting came to existence: this tries to remove the dirty flanks from the Recovery phase. The issue I do see is that some cheaper rowers only have two or three impulses per recovery, so in some cases it might not work for them. But removing flanks certainly helps.

Working completely interruptdriven would be my preferred approach as well, you want a pool of the freshest data ready to post to the Bluetooth clients. Are you implementing the PM5-protocol or the FTMS protocols? The PM5 has some statuses I was able to map to my Finite State Machine. I haven't been able to thoroughly test the response of EXR yet, but it is more consistent with the PM5 interface than ORM 0.8.2 is.

The problem I see is that the use of angular velocities is prone to small measurement errors. The RPI isn't real-time and IMHO Java isn't the best language to handle real-time interrupts anyway, but it is idle 99%, so it does OK but you should be aware of small errors. Depending on a single currentDt as a divisor could enlarge errors and thus create a lot of noise. Having less impulses per second should help, but it is tricky nonetheless. And then you are fighting noise constantly. In all honesty, in the last months I completely rewritten that piece of code as it lacked the clarity and consistency. This kind of issues are too spread out in the code to be easily understood when revisited, which usually is a bad sign.

My new implementation assumes that detecting the deceleration signals the start of the recovery, but only includes the next currentDT in the linear regression analysis (it does start the "time in recovery timer"), and it will only include new currentDt's when processing the previous currentDt has concluded it still is in the recovery phase. So I guess I took the same approach as you did.

I currently am testing that part of the algorithm, including the use of flank removal, and on a data-rich machine like the Concept2 removing the flanks did not matter for the new algorithm. This could be different on your machine as you have less data to work with. Combined with the linear regression, I don’t need any angular velocities in the RowingEngine.js. In the Flywheel.js I only need the angular velocities for Torque determination (which is the basis for the power curve), but that is an advanced feature that isn’t released yet. I probably need to introduce velocities to correct an error in Concept2’s approach to power calculation, but that is another matter.

I can put my current raw implementation in a separate sandbox as-is if you want, but it won’t work with the 0.8.2 version of OpenRowingMonitor as the codebases diverged so some backporting and cleanup is still needed. As you need to port the code anyway, that probably won't matter anyway.

Abasz commented 2 years ago

I am not implementing PM5 specific BLE profile (I am not planning on that either). Rather I am implementing a cycling speed and cadence BLE profile. The reason is that I use a Garmin watch and I like its native profile that adds the Firstbeats health metrics based on activity details (I have written a fit file converter in C# that changes the cycling activity to rowing activity and adds certain rowing specific metrics like stroke per distance) before upload to Garmin connect. Also PM5 BLE profile is not straight forward due to the BLE profile length limitation (they need to use splitting). As an upgrade I would actually implement the cycle power meter BLE profile and/or fitness equipment/machine profile (but my watch for some reason does not support these so once I upgrade the watch I will think about upgrading the BLE code too). But then naturally the question comes, how the distance is transferred to the watch? There is one additional BLE Service to the CSC I included, that is none standard and that is for drag factor: I actually advertise the DF and the corresponding distance per turn in millimeters using simple “string” BLE profile. I read this and set this on the watch before I start the activity as “bike wheel circumference” and that is how the speed is shown. This is not the best, but its due to the limitation of the watch it self. If my watch would implement fitness machine/equipment BLE profile (newer Garmin watches do that) I would implement that instead of PM5 profile to be honest. But that is because I am in the Garmin ecosystem.

If its ok, I would like to have a look at the regression method (especially in the light of your approach using the same period of recovery approach as I). I’ve been thinking about this and actually I can see that it could yield better (more consistent) results. Testing such theories are rather cheap as I a have logged all exercises to a console from the ESP32 so I have actually exact timing data (both filtered and raw) and these can be rerun similarly to the ORM testing environment (I visualize these on graphs in excel file). So if you can push your code to a separate branch I would appreciate. I am not a professional programmer though, this is just a strong hobby 😊 so code review may take time 😊.

As for my original “bug report” it was for the current master implementation of the ORM as observation. Thanks for the constructive discussion, it is very valuable. There is no real validation option with these things for my setup (apart from creating graphs and visually inspecting them and determine if they make sense) so for me it is a little blind shooting and trusting in correctness of the math as much as possible 😊

JaapvanEkris commented 2 years ago

Hi Abasz,

I know the feeling of getting data, and being clueless about its accuracy. That is why I'm delighted to be testing side-by-side against the PM5. At least I know what the right answer should be :).

You can look at https://github.com/JaapvanEkris/openrowingmonitor/tree/Raw_Implementation_As_Is , which contains the main modifications. The CI-pipeline obviously fails as the code bases diverged and I'm still backporting things. And it contains some telemetrics to see what is going on inside the engine. Main files you should take a look at are LinearRegressor.js, Flywheel.js, RowingEngine.js, RowingStatistics.js and Server.js.

As an added bonus, I have some beta code called vo2max.js. Which contains (a mostly hardcoded) VO2Max approximation. Perhaps you can test it against the Garmin/FirstBeat algorithms, and see what their estimation is. Please note that the complete formula can be found under https://sites.udel.edu/coe-engex/2019/03/16/how-accurate-is-your-garmins-vo2max-estimate/

I'm still trying to get ORM to create FIT files, but on Linux the SDK feels impossible. But that is probably just me.

Thank you for the good discussion, you asked the right questions and made me check my code for this type of error. Together, we will get this data out of our rowers.

Abasz commented 2 years ago

Thanks.

A note on the FIT SDK (and I will not "spam" this issue thread any further to avoid potentially pissing the maintainer off:) ), that seems a huge mess at the beginning indeed (although the [documentation[(https://developer.garmin.com/fit/overview/) has improved significantly in the last 1-1.5 years). It has a completely different approach to persist data. There is a JavaScript implementation but this is only from FIT to JSON. The SDK does not have a JS implementation (in this project it is done by hand using quite a bit of bit banging). The SDK is implemented for other languages though (C, C++, C#, Java). In these languages, creating a new FIT file with custom data is pretty easy as the SDK has succinct methods to do so and handles the necessary low level operations and ensures that certain rules are adhered. Its quite ironic that F stands for “flexible“ as it is a lot of thing but not flexible from a programming perspective😊. The problem is that the FIT SDK is a very closed non-extendible structure (this is by design) with sealed classes... Anyway let me know if you have any specific questions I spent quite some time about a year ago with the SDK, although with the C# variant.