Closed Mec-iS closed 4 months ago
Attention: Patch coverage is 54.16667%
with 33 lines
in your changes are missing coverage. Please review.
Project coverage is 45.49%. Comparing base (
80a93c1
) to head (3fcd722
).
Files | Patch % | Lines |
---|---|---|
src/linalg/basic/matrix.rs | 57.35% | 29 Missing :warning: |
src/algorithm/neighbour/bbd_tree.rs | 0.00% | 2 Missing :warning: |
src/error/mod.rs | 0.00% | 2 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@morenol we need to fix Wasm tests
releasing a new minor for these changes is not completely justified. We need to add some more implementations to this to justify a minor release.
Also: probably, in accord with the library principles, probably it is better to panic early at initialization than returning a Result
. smartcore is meant to be as stable as possible after compilation, using a Result
may just move problems into runtime. So there is the option of just allow controlled panic!(..)
instead of returning a Fail
.
To which function are you referring?
I get that if a config or something early is flawed, the app should exit with useful feedback. Elsewhere, the user can "opt-in" to a panic by using "unwrap".
Here is a link that might be useful https://doc.rust-lang.org/book/ch09-03-to-panic-or-not-to-panic.html. I see a line between if "dangerous/harmful" then panic, otherwise Err. Err is a valid answer given the input. Panic is not generally expected behavior. Furthermore "fail early" is not what we are talking about, it's more "at the time a mistake is discovered" what does the lib return.
As a user of the library, I would expect to be able to send commands over time. A user (or service), might "get it wrong". Does that mean the user cannot "try again"?
In the event I run this in Tokio, I would not want the thread to panic (very bad and tough to navigate), but rather give me the answer - that can include: Err could not process b/c data was wrong, a match branch I can recover from.
Runtime errors in py is the reason I moved to Rust. Returning Err is not a runtime error like we experience in py. It's only a decision on what to return - app crash/panic or Err.
More generally about runtime errors: by definition of the compile-time checks in Rust, they don't equate to py (incomparable).
Right now the lib is fragile because it doesn't have strong typing through and through (likely mostly on the result side of the process). It will get better over time where the app does not instantiate a bad state (like what we have with DenseMatrix in the 0.4). Until then let the user know with an Err. For instance getting results is a fragile process.
What do you think?
I also think that a Result is better than panic for this case. Also agree that we could delay the crates.io release until we have more things that we could add to a breaking change release
I agree there is more that can be done for the next major release. I have some ideas for how to add to the changes for 0.4. Much of the ideas come from what Axum
has done for http
.
I tried to use the coefficients and intercept to output probabilities (not binary classes) and was surprised by how fragile the process was. e.g., get_row(0)
, get((0,0))
requires a lot of "under the hood" knowledge not to mention the gymnastics required to extract values from &X
.
One way to go from a -> b -> c
without the user being aware, is create what I call "transient types". Using the transient types you build implementations of std::convert::From
. This way, the end user can do something like:
let coefficients: Vec<_> = lr.coefficients().into();
let intercept: f64 = lr.intercept().into();
// or
let coefficients: Vec<_> = lr.coefficients().iter().collect();
let intercept: f64 = lr.intercept(); // casts to f64
There are other "best practices" for increasing the usability of each method by using traits such as AsRef
, Borrow
that I wonder might be worth looking through.
Augment what is included in a report e.g., F stat. I'm not all familiar with the subtle differences between ML and traditional stat modeling. I come from the latter. The incremental effort to report on "common sense" "checks" that we should be interested in is likely good fodder for discussion. It might be a feature, or the user choosing to "see" more of what has already been computed in the default features.
Move away from building your own underlying data structures for hosting and executing common math operations e.g., use ndarray. We are using argmin
. We should do the same for the underlying data hosting structs.
Implement "the builder" pattern, like what we experience in many Rust crates, e.g., build a http request. I see what we are doing here in a similar vein. This will augment the capacity to build a "default" for whatever analysis that is "more often" what you want. This would also require augmenting the documentation to fully explain the additional options (accelerated by pointing users to generically available descriptions of the concept on established stats/ml sites?).
All in all, it might be useful to "draw a line" where you want to be able to rely on modules that emit Result
vs other, internal private methods that only assert
to convey a critical dependence, but that can be made "sound" due to internal, actively monitored assertions. In general, outside "monad", inside "pure".
For the next major after this, or the next after that :)), I would start thinking more about performance. I hit performance issues with 50k x 100 sample size. Part of this would include a review of when we borrow vs take ownership. Ironically, I wonder if there is too much effort to avoid copies. My main point is this, it's still too early to worry about performance in my view. Just note, we have room to improve when the time comes. Unlike python, it will actually make the code both better and more readable, not more convoluted :))
Finally, I've surveyed several packages to accomplish what I need for building stat models. It's a fragmented space with many "half-way" efforts. I like what you all have accomplished. I believe the core strength is assembly: data -> design -> results
. I see Axum
in smartcore
. They assemble the best while focusing on a productive, type-safe, Rust user experience.
I hope this input is helpful. I spent a many years working in Rust. I'm happy to help however you all like.
Thanks for your ideas. I am glad you found smartcore
interesting and you plan to contribute more!
I believe the library can be very useful in different domains and I am now focusing into using the library for applications to see what it can do. My idea is to keep the library different from the others on multiple dimensions. I think there is already a lot of effort into making things work for "larger" high-level use-cases and that side-tracking this approach can be very interesting in the long term for domains that are maybe not in the "big data" scenario; polars
and linfa
are already doing great on that, we don't want to overlap too much with their use cases; please see the CONTRIBUTING.
I am currently taking feedback from embedded developers and audio technologists mostly, focusing on real-time processing as an objective; so the use cases I have in mind are more in the context of low-level, small devices, highly portable, high-frequency computations. For example, this field of applications probably needs more something like the latest research on dimensionality reduction techniques than supporting billions of features, or else making ML algorithms more useful on tabular datasets. In this context user experience is a bit of a second-order problem but as far as resource usage stays and these levels (both in terms of time but maybe more important for small devices also space complexity) or improve I am in general OK with better APIs and more features.
It would be great to have more idiomatic Rust, I think let's start with the transient types and the builder pattern, after that we want probably to set up good profiling (we have a bench repo but it is very limited and difficult to run, it would be nice to have something like tarpaulin for profiling, that runs with CI/CD). Vlad, that created the library, is also of the idea that we should drop our custom traits system on arrays but this is something we can see later as far as we provide the bindings to ndarray
.
Also it would be nice to have finally fixed predict_proba
#211, but I don't know the functionality well enough to know if it is a problem with numerical stability of the library or the implementation itself.
Welcome! @EdmundsEcho
To better understand the design intent was implemented, may I ask
Array2d
instead of a concrete type that works optimally with the solver? e.g., why not use DenseMatrix
and convert whatever users want to this standard? Array
-related traits used internally? Or are they there to support pre-processing?To better understand the design intent was implemented, may I ask
1. what the benefits of using a trait `Array2d` instead of a concrete type that works optimally with the solver? e.g., why not use `DenseMatrix` and convert whatever users want to this standard? 2. are all of the methods in the `Array`-related traits used internally? Or are they there to support pre-processing? 3. how do you see the line between pre-processing / post-processing and smartcore? 4. why not use argmin?
SparseMatrix
for example)no_std
) feature/fork at some point in the futurendarray
in the internals will make everything dependant on that, and I don't know if it's worth as we would just overlap other libraries' use cases and working patterns.@morenol please take a look
I think that we need to run cargo fmt
Collecting changes for version
0.4.0
New expected behaviour
see #256
Change logs
Result
type toDenseMatrix
constructor by @EdmundsEchoDenseMatrix
constructor, now constructor returns aResult