Bugfixes and improvements for V1Beta

JaapvanEkris commented 1 year ago

New functionality:

Added support for ANT+ rowing metrics broadcast
Allow the user to change the GUI layout and metrics, including displaying the force curve
Allow user to turn on or off ANT+ and BLE functionality and dynamically switch between ANT+ and BLE HR monitors from the GUI
Added the option for more complex workouts, as a hook for the PM5 and webinterface (these are a ToDo where the PM5 workout interface is still in development)
Added PM5 Interval-types

Added some bugfixes/robustness improvements:

Added support for the newest version of Raspberry Pi OS (Bookworm), moved from Node.js v16 (EOL) to Node.js v20 (current) and upgraded packages where possible. Is known to work on Raspberry Pi , 4 and 5.
Improved the accuracy, responsiveness and efficiency of both the Linear and Quadratic the Theil-Sen algorithms. For larger 'flankLength' machines (12+ datapoints), this improved algorithm will also result in 50% reduction in CPU use, while increasing the responsiveness and accuracy of the forcecurve and powercurve. For smaller 'flankLength' machines it will improve less, but still improve.
Drag calculation and recovery slope calculation now use the Linear Theil-Sen algorithm, making this calculation more robust against outliers and stroke detection errors
Added a configuration sanity check which logs obvious errors and (if possible) repairs settings, after several users messed up their config and got completely stuck.
The configuration sanity check also provides an automated upgrade path for 0.8.4 users to V1Beta, as all the newly added configuration items between these two versions are automatically detected, logged and repaired.
Added restart limits to prevent infinite boot loops of the app crashing and rebooting when there is a config error
Fixed the GPIO tick rollover, which led to a minor hickup in data in rows over 30 minutes
Made Flywheel.js more robust against faulty GPIO data
Fixed an application crash in the RowingData generation when the target directory doesn't exist yet
Improved the structure of the peripherals to allow a more robust BLE and ANT use

JaapvanEkris commented 1 year ago

Improved the documentation to adress https://github.com/laberning/openrowingmonitor/issues/124

Abasz commented 1 year ago

There is no need to edit or commit the package-lock.json. its an automatically generated file, any dependency is added via the package.json

https://docs.npmjs.com/cli/v9/configuring-npm/package-lock-json

_"package-lock.json is automatically generated for any operations where npm modifies either the nodemodules tree, or package.json. It describes the exact tree that was generated, such that subsequent installs are able to generate identical trees, regardless of intermediate dependency updates."

JaapvanEkris commented 1 year ago

There is no need to edit or commit the package-lock.json. its an automatically generated file, any dependency is added via the package.json

The automated build/test system breaks on it. I fixed the dependencies and now the pipeline works again. Not a big issue.

JaapvanEkris commented 1 year ago

Based on the code I understand that the forceOnHandle array is filled with the _torqueBeforeFlank instead of the _torqueAtBeginFlank
return _torqueBeforeFlank
What is the reason for this? I would expect that actually when the data for the force curve is collected, the torque related to the current impulse's deltaTime should be used. So would not be the _torqueAtBeginFlank value better?

Nope. The flank isn't exposed to the outside world: all exposed metrics (angular position, angular velocity, angular acceleration, time) are based on the BeforeFlank (thus excluding any knowledge about the contents of the flank itself). The only exception is that flank data is used to determine the flywheel state (i.e. powered, unpowered, dwelling). That is essentially the flank's use, you need to determine whether you are in a drive or recovery, so in the next processed currentDt, you can pick up the data in the BeforeFlank position and process it in the correct drive state,

Please note: the flank is also used to create a Theil-Senn regression of the Angular Velocity and Angular Acceleration, where the algebraic properties are used to stabilise these measurements further, which both are essential ingredients to the Torque calculation, This is why we can't use a series of torques to determine flywheel state, but we must rely on a single measurement at the head of the flank.

JaapvanEkris commented 1 year ago

While this explanation is clear I think wrong force value gets included in the force curve. Let me explain:

This was a very conscious design decission I made over a year ago. Really revisitting this would require me to go back to my notes. Key concept here is that for the data in the flank we don't know what phase it belongs to, and we do know for the data in front of the flank.

the flywheeel.torque() function is called only in the startDrivePhase() and updateDrivePhase() to save that into the force curve. In this moment ORM knows that the current flank is drive. So when we are in startDrivePhase() function I would expect that here the _torqueBeforeFlank is actually part of the recovery.

That is correct, but the createCurveAligner in (see https://github.com/JaapvanEkris/openrowingmonitor/blob/v1beta_updates/app/engine/RowingStatistics.js) is there to remove the head and tail values if there is noise in them. As torque is based on potentially volatile metrics and thus can be noisy, it is a necessary step that can't be skipped. That is what I meant with that the current forcecurve implementation requires some serious thinking before turned into a live forcecurve viewer on the GUI (as cleaning up noise live can have weird visual effects).

To provide the specific issue, I noticed frequent small negative value as the first value in the force curve in driveHandleForce when running ORM in debugging mode that in my view should not have been there (I use no recovery slope and stroke quality of 1, so basically recovery detection is purely torque based). Based on the code those negative forces are actually the last force values before the drive, hence I would not expect those to be included in the force curve.

If you use the exposed driveHandleForceCurve from the RowingStatistics with a decent minumumForceBeforeStroke, that shouldn't happen. One can debate if the curvealigner should always prefix the curve with a starting base value (zero for force, but what to do with handle speed), but that is the thing that produces force curves for over a year now in quite a reliable fashion (and yes, I check them all each row).

And please, let's not have discussions about extensions in a pull request. This reduces the clarity of the status of this pull request for the package maintainer.

beavis69 commented 1 year ago

@JaapvanEkris Hi, it seems that createRower should initialize _totalNumberOfStrokes = 0.0 instead of -1.0 with the machine state bug fix. With your fix strokes count is fine when doing pauses, but first stroke is not counted.

JaapvanEkris commented 1 year ago

@JaapvanEkris Hi, it seems that createRower should initialize _totalNumberOfStrokes = 0.0 instead of -1.0 with the machine state bug fix. With your fix strokes count is fine when doing pauses, but first stroke is not counted.

This is intentional. The "first stroke" is the transition from a stationary flywheel to a moving one, where the recorded stroke 0 will be the state of a stationary flywheel (i.e. distance, velocity, etc. set to zero, and only the HR and timestamp containing meaningfull data).

beavis69 commented 1 year ago

This is intentional. The "first stroke" is the transition from a stationary flywheel to a moving one, where the recorded stroke 0 will be the state of a stationary flywheel (i.e. distance, velocity, etc. set to zero, and only the HR and timestamp containing meaningfull data).

Before this commit, the first stroke was counted, and at every pause i had stoke count wrong (stroke count = stoke count + 1 for every pause).

With this commit, stroke count is ok for every pause (no more +1 per pause), but the first stroke is not counted (it was counted before the fix).

So i dont understand, if it's intentional, why the stoke count is different at start and after a pause, as in both case the fly wheel is stationnary.

JaapvanEkris commented 1 year ago

This is intentional. The "first stroke" is the transition from a stationary flywheel to a moving one, where the recorded stroke 0 will be the state of a stationary flywheel (i.e. distance, velocity, etc. set to zero, and only the HR and timestamp containing meaningfull data).

Before this commit, the first stroke was counted, and at every pause i had stoke count wrong (stroke count = stoke count + 1 for every pause).

With this commit, stroke count is ok for every pause (no more +1 per pause), but the first stroke is not counted (it was counted before the fix).

So i dont understand, if it's intentional, why the stoke count is different at start and after a pause, as in both case the fly wheel is stationnary.

The issue is that most consumers of the stroke data expect a stroke 0 (i.e. Garmin, Strava, Intervalse.icu and RowingData), as it is the starting point of the exercise. For example, Strava measures the time and distance between the first and last recorded stroke. So if the first recorded stroke doesn't start with 0, you end up with a too short exercise. So for a simple workout, this should work.

Although you can practically pause a workout by stopping with rowing (and if you are willing to modify server.js you can even program it to pause), we haven't figured out how to administer it internally or export it externally. And that should be resolved before we start looking at pause behaviour. The tcx and RowingData recorders are redesigned now, but we need some serious thinking about how to handle specific stops and resumes in terms of laps, segments and intervals.

beavis69 commented 1 year ago

Ok, i understand. As i'm using a garmin watch with openrowingmonitor as fec ant+ sensor, i'll go with _totalNumberOfStrokes = 0.0 at initialization as it works best for me.

JaapvanEkris commented 1 year ago

Ok, i understand. As i'm using a garmin watch with openrowingmonitor as fec ant+ sensor, i'll go with _totalNumberOfStrokes = 0.0 at initialization as it works best for me.

That is a bug on Garmin's side, not ours. When I look at the data of my own row yesterday, I did a 15K (no pauses). Concept2 reports 1494 strokes. The GUI and the RowsAndAll export contains 1494 strokes as well (in this, stroke 0 isn't added to the stroke count, as it is stroke 0). Garmin is insistent it just saw 1493 strokes.

Abasz commented 1 year ago

@beavis69 Actually we had a discussion with Jaap last year similar to this one I think, and while my view is more inline with yours (and Garmin), I understand his point. For me logic dictates that when I pull the handle (whether the first time or not) that is a valid stroke, so in line with this logic the first stroke is the first stroke while stroke zero is not a stroke, but the state before the first stroke. Of course this statement is pretty fuzzi, but it is easier to visualize if you look at the stroke at the end of the drive: The start of the very first ever drive is the zero state, and the end of this drive is stroke one. Now what I understand from the previous discussion and the way the code works, since ORM counts and records cycles from drive-start-to-drive-start due to the need for consistency the only thing that can be done is that you initialize the total stroke count the way it is done now (I agree that there my be inconsistency to the restart after a pause state, but I have not thought it through in depth if for instance resetting to -1 there would solve the issue).

But when 90% of the consumers expect one way of doing things its probably better to go towards majority as it will generate less questions along the way.

Actually I would not change this on the engine side as it affects too many consumers, but rather make some changes on the Ant+ side (like adding one stroke to the total - that may not work for pause and resume and all that complicated stuff, that needs to be tested). Of course it could have been done the other way originally (i.e. initialize with "Garmin's" logic and then deduct for the other consumers), but this was not the design decision that was made. Also I dont know whether this would mess up any other watch's expectation as I never used anything else than garmin :)

beavis69 commented 1 year ago

Ok, now i better understand that the first stroke is counted as zero and why.

The only thing that has surprised me, is that the behavior for the first stroke (starting at 0 or 1) was changed by this commit https://github.com/laberning/openrowingmonitor/pull/114/commits/43d7e45aee7c877f9bcbdeffe9694eab8946ce6b so i was wondering if it was normal behavior.

I have tested amazfit watches but not with ant+ (they were counting stokes with the accelerometer sensor).

JaapvanEkris commented 1 year ago

Ok, now i better understand that the first stroke is counted as zero and why.

The only thing that has surprised me, is that the behavior for the first stroke (starting at 0 or 1) was changed by this commit 43d7e45 so i was wondering if it was normal behavior.

I have tested amazfit watches but not with ant+ (they were counting stokes with the accelerometer sensor).

It was a bug that was present in the published code due to a copy-paste error. Typically, functionality is tested by me or Abasz for several weeks before it reaches an active branch. I forgot to copy that part, and only discovered it when I did a clean install last week.

JaapvanEkris commented 1 year ago

@beavis69 Actually we had a discussion with Jaap last year similar to this one I think, and while my view is more inline with yours (and Garmin), I understand his point. For me logic dictates that when I pull the handle (whether the first time or not) that is a valid stroke, so in line with this logic the first stroke is the first stroke while stroke zero is not a stroke, but the state before the first stroke.

In a manual upload to Strava and Garmin (although they changed some things in the last weeks, still looking what), you need a stroke zero, basically deciding what the time and distance at start are. So you actually need to record the time of the start and declare a distance=0, otherwise your piece might end up 3 seconds and 10 meters too short as it will zero everything on the first stroke it encounters.

Of course this statement is pretty fuzzi, but it is easier to visualize if you look at the stroke at the end of the drive: The start of the very first ever drive is the zero state, and the end of this drive is stroke one. Now what I understand from the previous discussion and the way the code works, since ORM counts and records cycles from drive-start-to-drive-start due to the need for consistency the only thing that can be done is that you initialize the total stroke count the way it is done now (I agree that there my be inconsistency to the restart after a pause state, but I have not thought it through in depth if for instance resetting to -1 there would solve the issue).

In fact, Concept2, RowingData and many others define a stroke as a "drive followed by a complete recovery". So technically, the stroke is only completed if and only if the recovery is completed as well. So stroke 1 might start with the first pull on the handle, but it is only completed once you return to the catch position.

This is actually what ORM currently does. On the detection of the first drive it will trigger stroke zero to zero all recorders. After completion of the entire stroke (including the recovery) it will record the first stroke as being complete, including the drive, force curves, and recovery timing. If you want to understand how ORM "thinks" you can best look at the RowingData output, and you recognise it directly.

But when 90% of the consumers expect one way of doing things its probably better to go towards majority as it will generate less questions along the way.

I think when it comes down to it, I'd rather be consistent with Concept2, since they really know this stuff inside out for decades and their approach is commonly accepted by anyone in the rowing community. Garmin has quite a questionable reputation among serious rowers, as they still haven't figured out the key metrics and how to represent them (measuring altitude difference on an erg, pace in minutes per kilometer, and a "best" strokerate???).

Actually I would not change this on the engine side as it affects too many consumers, but rather make some changes on the Ant+ side (like adding one stroke to the total - that may not work for pause and resume and all that complicated stuff, that needs to be tested). Of course it could have been done the other way originally (i.e. initialize with "Garmin's" logic and then deduct for the other consumers), but this was not the design decision that was made. Also I dont know whether this would mess up any other watch's expectation as I never used anything else than garmin :)

Pausing and resuming is actually quite difficult. The key issue is that normally you'd conclude that the stroke has ended when you see a next drive. Now that drive is so far away that the pause timer gets triggered. But when a pause gets inserted, it in fact will correctly count that last stroke as being there. And that is where things get messy: you somehow need to zero the recorders again to be able to calculate the pause time correctly, but our recorders still lack the concept of a segment, so they can't explicitly record it (or the clients broadcast it).

We've laid some groundwork already. There is a data element called intervalNumber in RowingStatistics, which is now exported to RowingData. But normal users can't insert a structured workout, and thus lack a way to control it. And PM5/BLE/ANT+ clients are not yet prepared for this element. To put it bluntly: RowingStatistics can now tell a planned segment has ended, but nobody actually listens and knows how to handle that situation.

I think that when we really start to support structured workouts, we need to define per client how to handle this situation and you end up with totally different approaches anyway as it forces new messages and responses when pauses are planned. TCX and FIT support a multisegment activity, so setting that should resolve these issues more fundamentally, which should help the ANT+/PM5/BLE interfaces as well.

Abasz commented 1 year ago

This is actually what ORM currently does. On the detection of the first drive it will trigger stroke zero to zero all recorders. After completion of the entire stroke (including the recovery) it will record the first stroke as being complete, including the drive, force curves, and recovery timing. If you want to understand how ORM "thinks" you can best look at the RowingData output, and you recognise it directly.

One note, on creation of the rowerstatistics you initilize the totalNumberOfStrokes to zero. Is this intentional? https://github.com/JaapvanEkris/openrowingmonitor/blob/43d7e45aee7c877f9bcbdeffe9694eab8946ce6b/app/engine/RowingStatistics.js#L39

@JaapvanEkris I have an idea what the issue of Ant+ is. I looked at the raw data file and for me after stroke zero, stroke number 2 comes: <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

Column1 | index | Stroke Number | TimeStamp (sec) | ElapsedTime (sec) | HRCur (bpm) | DistanceMeters | Cadence (stokes/min) | Stroke500mPace (sec/500m) | Power (watts) | StrokeDistance (meters) -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 0 | 0 | 0 | 1678894851 | 0.00 | #NUM! | 0.0 | 0.0 | NaN | #NUM! | 0.00 2 | 2 | 2 | 1678894854 | 3.67 | #NUM! | 8.4 | 17.3 | 205.96 | 40 | 8.42 3 | 3 | 3 | 1678894858 | 6.98 | #NUM! | 18.5 | 18.2 | 163.94 | 79 | 10.07 4 | 4 | 4 | 1678894861 | 10.10 | #NUM! | 28.2 | 19.2 | 160.12 | 85 | 9.75

So if I disregard stroke zero (that is the start) I get total stroke -1. The way ANT+ works is not that it looks at the final result (like strava, and the other services), but checkes the changes to the increment value. In theory this should result in the same (as if there is an increment of 2 than it should increment with two) but it is possible that Garmin sense checks this and does not allow double increments on the stroke (which is logical on one hand, but leaves bugs on disconnect...). I dont know the inner workings of garmin in this respect. So this is just a guess

JaapvanEkris commented 1 year ago

One note, on creation of the rowerstatistics you initilize the totalNumberOfStrokes to zero. Is this intentional? https://github.com/JaapvanEkris/openrowingmonitor/blob/43d7e45aee7c877f9bcbdeffe9694eab8946ce6b/app/engine/RowingStatistics.js#L39

@JaapvanEkris I have an idea what the issue of Ant+ is. I looked at the raw data file and for me after stroke zero, stroke number 2 comes:

Column1 index Stroke Number TimeStamp (sec) ElapsedTime (sec) HRCur (bpm) DistanceMeters Cadence (stokes/min) Stroke500mPace (sec/500m) Power (watts) StrokeDistance (meters) 0 0 0 1678894851 0.00 #NUM! 0.0 0.0 NaN #NUM! 0.00 2 2 2 1678894854 3.67 #NUM! 8.4 17.3 205.96 40 8.42 3 3 3 1678894858 6.98 #NUM! 18.5 18.2 163.94 79 10.07 4 4 4 1678894861 10.10 #NUM! 28.2 19.2 160.12 85 9.75

Could you take a look at the RowingData output? What set me on this track in the first place is that the RowingData did the exact same thing when I did a clean install, but my production install did not, so that triggered me to update the code in GitHub. When I look at the changes to RowingStatistics.js in the V1Beta_Architecture_Revision I see that the let totalNumberOfStrokes = -1 was seen as a change, while it should have been present already in the normal V1Beta_Updates branch.

So if I disregard stroke zero (that is the start) I get total stroke -1. The way ANT+ works is not that it looks at the final result (like strava, and the other services), but checkes the changes to the increment value. In theory this should result in the same (as if there is an increment of 2 than it should increment with two) but it is possible that Garmin sense checks this and does not allow double increments on the stroke (which is logical on one hand, but leaves bugs on disconnect...). I dont know the inner workings of garmin in this respect. So this is just a guess

Know the feeling. Still trying to discover why Strava sometimes looks like a mess while every other consumer looks great.

beavis69 commented 1 year ago

Actually garmin is ok with that (with my watch using the rower as ant+ sensor) Stroke count is the same on garmin connect and on openrowing monitor web interface. But i have not tested with a manual tcx upload to garmin.

Abasz commented 1 year ago

One note, on creation of the rowerstatistics you initilize the totalNumberOfStrokes to zero. Is this intentional? https://github.com/JaapvanEkris/openrowingmonitor/blob/43d7e45aee7c877f9bcbdeffe9694eab8946ce6b/app/engine/RowingStatistics.js#L39 @JaapvanEkris I have an idea what the issue of Ant+ is. I looked at the raw data file and for me after stroke zero, stroke number 2 comes:

Column1 index Stroke Number TimeStamp (sec) ElapsedTime (sec) HRCur (bpm) DistanceMeters Cadence (stokes/min) Stroke500mPace (sec/500m) Power (watts) StrokeDistance (meters) 0 0 0 1678894851 0.00 #NUM! 0.0 0.0 NaN #NUM! 0.00 2 2 2 1678894854 3.67 #NUM! 8.4 17.3 205.96 40 8.42 3 3 3 1678894858 6.98 #NUM! 18.5 18.2 163.94 79 10.07 4 4 4 1678894861 10.10 #NUM! 28.2 19.2 160.12 85 9.75

Could you take a look at the RowingData output? What set me on this track in the first place is that the RowingData did the exact same thing when I did a clean install, but my production install did not, so that triggered me to update the code in GitHub. When I look at the changes to RowingStatistics.js in the V1Beta_Architecture_Revision I see that the let totalNumberOfStrokes = -1 was seen as a change, while it should have been present already in the normal V1Beta_Updates branch.

Ok, here are the results:

old v1beta_update (i.e. without commit we are discussing here): original-v1Beta_update_rowingData.csv first stroke: 0 second stroke: 2 total strokes: 605

current v1beta_update: current-v1beta_rowingData.csv first stroke: 0 second stroke: 1 total strokes: 604

Architecture revision branch: Architect-rework_rowingData.csv first stroke: 0 second stroke: 1 total strokes: 604

@beavis69

Actually garmin is ok with that (with my watch using the rower as ant+ sensor) Stroke count is the same on garmin connect and on openrowing monitor web interface. But i have not tested with a manual tcx upload to garmin.

what do you mean by the above? Are you saying that it works with the update Jaap commited? or something else?

beavis69 commented 1 year ago

I'm mean : with current v1beta_update stroke count on garmin (ant+ fce) = stroke count on openrowing moniting web interface.

But i found the total distance is really wrong : i got 8321m on openrowing monitor and 8832m on garmin ! I'll try to export and dump the fit file to understand why. I suspect it has something to do with pauses.

Abasz commented 1 year ago

But i found the total distance is really wrong : i got 8321m on openrowing monitor and 8832m on garmin ! I'll try to export and dump the fit file to understand why. I suspect it has something to do with pauses.

It most probably is, as the current ANT+ (and most of the peripherals) do not handle the reset pause and these events. So it is possible that the ANT+ sends redundant data that is not processed by the watch properly. With hole honesty, I never tested it with pauses. I mean I never had an interval or fartlek session where I do not keep rowing very slowly on the recovery/rest part so for me ORM never signals pause.

Once we have pause and resume figured out I will look at this as well. In the mean time, can you send me a raw data file that I can replay for testing where you had this issue?

Please for further discussion on this (mismatch of distance) open a separate issue thread. I dont think this is a v1beta_update issue.

thanks

beavis69 commented 1 year ago

Ok, here is the file. I'll open a new thread for distance mismatch. 2023-05-02_11-06-04_raw.csv.gz

r4fterman commented 6 months ago

Can confirm the installation is working on a Raspberry Pi 5 with aarch64 architecture.

JaapvanEkris commented 6 months ago

Can confirm the installation is working on a Raspberry Pi 5 with aarch64 architecture.

Thanks for reporting. I'll add this to the list of supported devices.

laberning / openrowingmonitor

Bugfixes and improvements for V1Beta #114