stadelmanma / fitparse-rs

Rust library to parse FIT formatted files
MIT License
54 stars 10 forks source link

Generate message types #14

Closed robinkrahl closed 9 months ago

robinkrahl commented 2 years ago

This is a first draft implementation of #13 with some open points as indicated by the TODO comments. Still, using the parse example, I was able to parse some real-world test files as well as the files from the tests/fixtures directory.

stadelmanma commented 2 years ago

@robinkrahl any news on this? Apologies for going AWOL for awhile.

robinkrahl commented 2 years ago

I think we should first talk about whether this is generally the right approach. Then we can discuss the open points mentioned in the TODO comments.

stadelmanma commented 2 years ago

I do wonder if we’d be ahead to move the “FitDataRecord” concept into a trait that each of these message structs would implement and return structs of the actual message types directly instead. From a parsing standpoint usage would probably be the same (returns Vec or something) which provides the same access/serialization methods as before. It feels a little more intuitive to return the specific thing and be able to use it in a generic fashion rather than return the generic thing and convert it into the more specific thing.

stadelmanma commented 2 years ago

I’ve worked off and on trying to implement a different decoding process that’s more efficient where instead of creating those MessageInfo structs at runtime each message type has a decode function building the decoded data record directly. It’s about twice as fast since building a MessageInfo struct on repeat is rather expensive. I’ve not managed to figure it out successfully yet with how some fields get split up and transformed into other things but that would work nicely to build a message type specific struct. See the ‘function-based-decoding’ branch, I just started on an updated version of it this morning.

robinkrahl commented 2 years ago

return structs of the actual message types directly instead

Typically, a FIT file will contain many different messages so I don’t see how we could return specific types. The Message enum could be used to remove the parsing step though.

I’ve worked off and on trying to implement a different decoding process that’s more efficient

Sounds good! I also experimented with moving much of that information into constants so that more work can be done at compile time. But I did not want to introduce too many changes at the same time. I’ll have a look at your branch next week.

robinkrahl commented 2 years ago

Can you please link the branch you’re working on? I don’t see it in the branch list.

stadelmanma commented 2 years ago

Sorry about that, I didn't realize that it wasn't pushed yet. https://github.com/stadelmanma/fitparse-rs/blob/function-based-decoding-old/src/profile/decode.rs

And by returning specific types I was meaning our message structs would look like this

pub struct Activity {
    pub total_timer_time: Option<f64>,
    pub num_sessions: Option<u16>,
    pub r#type: Option<fitparser::profile::field_type::Activity>,
    // ....
}

impl FitDataRecord for Activity {
    // ...
}

And then the parsing methods would return a trait type, at least I think that's possible. My Rust is a little ... rusty.

from_reader(...) -> Result<Vec<T: FitDataRecord>>
robinkrahl commented 2 years ago

from_reader(...) -> Result<Vec<T: FitDataRecord>>

But this would mean that we can only parse messages of a single type. If we want to support all messages, we either have to use a Box which is not really ergonomic, or use a wrapper enum like Message.

stadelmanma commented 2 years ago

Ah yep your right, I’ve been spending too much time in Java-land where you can do that kind of thing with interfaces.

Looking back at the issue the example struct for each message looks good, but I think I would prefer to add getters instead of using public members, and maybe setters as well. I’m 50/50 on setters since this library doesn’t support writing data but there certainly could be some use cases and I don’t want to put unnecessary limitations on folks.

I’m not a huge fan of the TryFrom logic for the conversion to the rich type. Since we already know what message type we have at decode time it would be nice to start with the rich type instead of trying to find our way back there. The enum wrapper route you mentioned might fit nicely, especially since we already use a Message enum to kick off the decoding process anyways.

It’d be nice to not have significant breaking API changes to the high level “from_X” parsing functions but they could convert the rich type into the generic one. The conversion from rich to generic could be done with a “From” implementation I think?

What kind of use cases do you envision being supported by explicit types? That might be the best way to inform our implementation route.

robinkrahl commented 2 years ago

I think I would prefer to add getters instead of using public members, and maybe setters as well.

I prefer public members because they are easier to use and they lead to cleaner code. I don’t see any reason to use getters and setters because there are no invariants to be enforced.

I’m not a huge fan of the TryFrom logic for the conversion to the rich type.

I only added that step because I did not want to break compatibility and I was under the impression that you preferred to use the raw type. I agree that returning the enum instead is much more ergonomic.

What kind of use cases do you envision being supported by explicit types?

Generally speaking, my goal is to make it easier to use the library and to discover the data structures from the API docs. My specific use case is this conversion code that already uses this PR. Ideally, I could also drop the unit conversions if fitparser takes care of them (as discussed in #13), but let’s do one thing at a time.

stadelmanma commented 2 years ago

I prefer public members because they are easier to use and they lead to cleaner code. I don’t see any reason to use getters and setters because there are no invariants to be enforced.

@robinkrahl that's fair, I can be persuaded to avoid the getters, again too much time spent in java-land nowadays. Components and subfields in a FitMessage do represent an invariant case between the parent and child fields but I don't think it's worth the hassle to bake that logic in.

I like the idea of adding uom as a feature if we can do it neatly. I think it would make some parts of my run-tracking code that uses this library cleaner as well in that department. So if you'd like to throw a separate issue up for that enhancement that'd be great.

I pushed some changes to the profile generation code to split it up from being a giant monolithic main script into a few files each with a specific purpose. I'm not entirely sure yet if it makes the profile generation library easier to navigate or not but at least I don't have to scroll as much. I think long term if makes more sense to call the file your code generates messages.rs and in my function based decoding branch I was starting to call what is now messages.rs decode.rs instead since that's it's purpose.

beyoung commented 1 year ago

Thanks for your great work on this feature 😊 . And any progress on this feature now ?

stadelmanma commented 1 year ago

@beyoung I don't have a use case for this feature myself so I haven't attempted to finish it up, and the underlying profile generation code has changed dramatically since this PR was initially drafted.

If you have a concrete need for it you are welcome to pick it up!

beyoung commented 1 year ago

@beyoung I don't have a use case for this feature myself so I haven't attempted to finish it up, and the underlying profile generation code has changed dramatically since this PR was initially drafted. If you have a concrete need for it you are welcome to pick it up!

I used to want to call fitparse-rs code from python through pyo3 and it would be better to return a good defined structrue. I know there's example about serializer fit to json , but the performace is bad when decode large json in python. Finally I found that I can return pyo3 PyObject and receive the object as dict from python side. now everything works great. And finally thanks your awesome work on this project. 😊

stadelmanma commented 1 year ago

@beyoung sounds great. If there are ways to streamline the process of calling the code’s api from Python I’m all ears. I’ve written c-extensions for Python but never rust bindings before.

beyoung commented 1 year ago

@beyoung sounds great. If there are ways to streamline the process of calling the code’s api from Python I’m all ears. I’ve written c-extensions for Python but never rust bindings before.

Do u mean something like the example/streaming.rs in the repo ? It's super easy to write rust bindings for python when you use packages like pyo3 andmaturin.

stadelmanma commented 1 year ago

@beyoung I don’t have any concrete ideas in particular. But given how I’ve structured the library and how Python works returning some kind of iterable structure probably makes sense. Since from there they can process stuff without needing to pass over the data multiple times.

robinkrahl commented 9 months ago

I’m currently trying to clean up some of my unpublished projects, so I need to get rid of Git dependencies. Therefore I’d like to revisit this PR/feature request and try to get it merged. I’ll update this PR and rebase it onto the current master branch soon.

Initially, my plan was to have a complete implementation in this PR. But maybe it would make more sense to split this up into several smaller PRs. What do you think? Also, would you be fine with an initial solution that does not cover every edge case? I think having message structs with proper typing for 95 % of the fields would already be a big improvement.

stadelmanma commented 9 months ago

I’m currently trying to clean up some of my unpublished projects, so I need to get rid of Git dependencies. Therefore I’d like to revisit this PR/feature request and try to get it merged. I’ll update this PR and rebase it onto the current master branch soon.

Sounds great!

Initially, my plan was to have a complete implementation in this PR. But maybe it would make more sense to split this up into several smaller PRs. What do you think?

It might make the most sense to have a long running feature branch you do incremental PRs into. Then we merge the feature branch all at once when it’s ready. That would keep code review in manageable chunks but not add incomplete bits to the main library until we’re happy with it.

Also, would you be fine with an initial solution that does not cover every edge case? I think having message structs with proper typing for 95 % of the fields would already be a big improvement.

I’d likely be fine with that, it’s usually very hard to cover all the bases in the first go around. Although I do think it depends on how we handle the edge cases, i.e. just less ergonomic to use vs some fields are simply unwritable vs unexpected panics. For example, it would likely be fine if the first pass is not able to handle writing out a data field that requires multiple rounds of subfield/component resolution (maybe with an appropriate error state to avoid generating invalid files). But I’d be less keen on it just bailing out with a hard panic out of the blue if that makes sense.

robinkrahl commented 9 months ago

I’ve updated the PR and rebased it onto the current master branch.

The main changes from the initial implementation are:

robinkrahl commented 9 months ago

It might make the most sense to have a long running feature branch you do incremental PRs into.

Sounds good to me. Just let me know which branch to base the PR against.

Although I do think it depends on how we handle the edge cases, i.e. just less ergonomic to use vs some fields are simply unwritable vs unexpected panics.

My plan would be to provide an implementation that is at least as stable and ergonomic as the current FitDataRecord path, and then gradually improve ergonomics.

The initial implementation in this PR has all fields as Option<Value>. I would then start to try replacing Value with more appropriate types depending on the field. E. g. fields with FieldDataType::UInt8 should use u8 (or some wrapper with improved error handling). Cases that are not handled yet would just mean that users have to look at their data or the profile definition and then manually convert the Value to the correct data type – the same thing they already have to do today. I don’t think it makes sense to spend weeks or months to handle every theoretical case before releasing these changes that greatly improve overall usability.

stadelmanma commented 9 months ago

We can use this branch, feature/13-add-data-types-for-messages. I agree once something is fairly stable and usable we can try rolling it out with some appropriate docs that it's in an "experimental" mode and the API is subject to change.

robinkrahl commented 9 months ago

Okay, I’ve rebased the PR onto the feature branch.

stadelmanma commented 9 months ago

I think this looks good. I left a few questions, mostly just for my curiosity/ improving my knowledge.

robinkrahl commented 9 months ago

Thank you for the review!