Open EdmundsEcho opened 1 year ago
The error happens when running the optimization, calling get
trait method for DenseMatrix
.
The optimization/first_order/lbfgs.rs
fn optimize<'a, X: Array1<T>, LS: LineSearchMethod<T>>(
&self,
f: &F<'_, T, X>,
df: &'a DF<'_, X>,
x0: &X,
ls: &'a LS,
) -> OptimizerResult<T, X> {
let mut state = self.init_state(x0);
df(&mut state.x_df, x0); // <<<
...
}
Thanks for reporting this. I will also take some time to look into it, there should be a test that covers LR, please add a test for this if you find the problem.
Hi @EdmundsEcho I don't know how could you even make your code to run actually. Please check this:
>=0.3.0
? if not, please use the latest version as it is the only API supportedsrc/linear/logistic_regression.rs
line 858 for examples about how to use the methodthere are two major problems with your method call let lr = LogisticRegression::fit(x, y, Default::default())?;
:
fit
methods require data passed as reference, so the right way of calling is let lr = LogisticRegression::fit(&x, &y, Default::default()).unwrap();
(note the ampersands), this is how the method is used in every example and tests.
Your code does not compile:
error[E0308]: arguments to this function are incorrect
--> src/linear/logistic_regression.rs:917:18
|
917 | let lr = LogisticRegression::fit(x, y, Default::default()).unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^
|
note: expected reference, found struct `matrix::DenseMatrix`
--> src/linear/logistic_regression.rs:917:42
|
917 | let lr = LogisticRegression::fit(x, y, Default::default()).unwrap();
| ^
= note: expected reference `&_`
found struct `matrix::DenseMatrix<f32>`
note: expected reference, found struct `Vec`
--> src/linear/logistic_regression.rs:917:45
|
917 | let lr = LogisticRegression::fit(x, y, Default::default()).unwrap();
| ^
= note: expected reference `&_`
found struct `Vec<i32>`
note: associated function defined here
--> src/linear/logistic_regression.rs:420:12
|
420 | pub fn fit(
| ^^^
421 | x: &X,
| -----
422 | y: &Y,
| -----
423 | parameters: LogisticRegressionParameters<TX>,
| --------------------------------------------
help: consider borrowing here
|
917 | let lr = LogisticRegression::fit(&x, y, Default::default()).unwrap();
| ~~
help: consider borrowing here
|
917 | let lr = LogisticRegression::fit(x, &y, Default::default()).unwrap();
| ~~
For more information about this error, try rustc --explain E0308
.
The compiler correctly asks references to be used and I don't know how you made this code to run as every compiler I tried (on x86) returned this error (passing values instead of references).
2. The `?` syntax is short for match expression used to do error propagation, all the methods in the library are implemented to use `unwrap()`. Again, your code will not compile using `?` with `smartcore` methods because of how the traits are implemented.
Remember always to pass by reference as `smartcore` is meant to use one matrix value passed to methods by reference so to avoid copying in memory; use slices when you need parts of matrices; take a peek to the examples and documentation for usage. There are also [Jupyter Notebooks you can read on Github](https://github.com/smartcorelib/smartcore-jupyter/blob/main/notebooks/03.00-Logistic-Regression.ipynb).
I will take a deeper look into this because it is always possible that there is some error buried in the API calls but for now I close this issue as resolved.
Thanks for using the library.
Thank you for the follow-up. Here is the code (it compiles :))
pub fn run_logit(&self) -> Result<LogitFindings> {
let x: &DenseMatrix<f64> = &self.inner.x;
let y: &Vec<u32> = &self.inner.y.iter().map(|f| *f as u32).collect();
println!("y vec height: {:?}", y.len());
println!("x matrix shape: {:?}", x.shape());
let lr = LogisticRegression::fit(x, y, Default::default())?;
// let y_hat = lr.predict(&x)?;
// todo: add the result to the dataframe and return findings
Ok(LogitFindings {})
}
[dependencies]
...
smartcore = "0.3.1"
Seems to be an edge case that we are not handling. I see that in the get
method of DenseMatrix, we check the index and panic if it exceed the max_value. For some reason, your panic is happening in other place
Where it starts pointing to an index out of bounds is in the DenseMatrix
implementation of Array::get
is after pos
[0, 93] (row, column), where I have 94 predictors. This is confusing to me because the target is much larger than the [col * self.nrows + row]
computation would suggest. Also, clearly the shape of the matrix indicates we have more than one row...
I saw the checks. It makes me wonder if there is a underlying data structure ("under the hood") being used that somehow is not being build large enough??
I constructing the the DenseMatrix by way of ndarray and directly from polars.
Here is as far as I can pinpoint.
// optimization/first_order/logistic_regression.rs
impl<T: FloatNumber + RealNumber> FirstOrderOptimizer<T> for LBFGS {
///
fn optimize<'a, X: Array1<T>, LS: LineSearchMethod<T>>(
&self,
f: &F<'_, T, X>,
df: &'a DF<'_, X>,
x0: &X,
ls: &'a LS,
) -> OptimizerResult<T, X> {
let mut state = self.init_state(x0);
df(&mut state.x_df, x0); // <<< does not get past here
let g_converged = state.x_df.norm(std::f64::INFINITY) < self.g_atol;
let mut converged = g_converged;
let stopped = false;
...
}
... where df
is a function closure.
let df = |g: &mut Vec<TX>, w: &Vec<TX>| {
objective.df(g, w)
};
The Array trait for DenseMatrix clearly demonstrates the issue: The self.values.len()
returns a value that is too small (missing a 1 col x nrows). So in a matrix (111, 3), the length returns 222. I'm not sure how the DenseMatrix with shape (111,3), only shows up with 222 values.
Thanks, with the full example makes more sense. I will look into it.
@morenol
Seems to be an edge case that we are not handling. I see that in the get method of DenseMatrix, we check the index and panic if it exceed the max_value. For some reason, your panic is happening in other place
if it is like this, wouldn't we see one of the tests involving that method to fail? That method is used in many other places, like iterators and everything seems to work OK.
@EdmundsEcho I have added your code to a test and everything works fine -> https://github.com/smartcorelib/smartcore/pull/257
You can see your code at this line and the pipelines are all good with every system/architecture we test for (Linux, Windows, MacOS, Wasm).
Our pipeline tests for MacOS with aarch64-apple-darwin
and all the tests pass.
Some checks:
LogisticRegression
method? The fact that your code compiles using the ?
syntax makes me think that you are using another library's method with the same name as any of smartcore's method support ?
, we always use unwrap() as we don't want to propagate errors. The problem may be in which traits you use in the module.Sorry I cannot replicate your problem. If you can tell me more about how to replicate your problem I would be glad to look into it. Please suggest another test that could spot the problem.
I think that we don't have the exact same data in our tests cases, right? We added a random test but it is not necessarily the same data. We would need to be sure. I would like to have an example of a DenseMatrix with this characteristics:
So in a matrix (111, 3), the length returns 222. I'm not sure how the DenseMatrix with shape (111,3), only shows up with 222 values
Thank you for generating the test. It was convincing and enabled my working "upstream" as needed.
It happens when smartcore
instantiates the host for the data. In my case it was using DenseMatrix
. I was able to instantiate the memory with a cols
value that was bigger than what I actually had in the data. So, for instance, smartcore
was operating on my array as if it had 300 values, when in fact I only had 200 values. I don't know if you have other instantiations that rely on the user "getting it right".
Just to make the point, here is adjusted code base. The ultimate solution would involve changing the return to a Result
because instantiating DenseMatrix
relying on user input to get the row/col count right (i.e., to match values.len()
) is fallible (PS: I hate to be the guy that demonstrates how daft a user can be).
// src/linalg/basic/matrix.rs
impl<T: Debug + Display + Copy + Sized> DenseMatrix<T> {
/// Create new instance of `DenseMatrix` without copying data.
/// `values` should be in column-major order.
pub fn new(nrows: usize, ncols: usize, values: Vec<T>, column_major: bool) -> Self {
assert!(
nrows * ncols == values.len(),
"Failed to instantiate DenseMatrix; data does not match shape"
);
DenseMatrix {
ncols,
nrows,
values,
column_major,
}
}
..
}
smartcore
shape
It was difficult to isolate this issue up to now because I was using the smartcore
shape
to report the shape of the data. I thought this was a direct observation of the data was, in reality, instantiated. However, that is not the case. smartcore
uses the inputs that I, the user, sent it to report on the shape; they are just numbers that don't actually reflect the instantiated memory. Thus, when I thought I was looking at the real shape of the data, I was looking at the mistake I was trying to identify making it impossible to find this way.
I might be more explicit what you mean in each of these scenarios using examples. It might help to know if the subsequent analysis (e.g., solving the regression), is faster one way or the other to let us know what is preferred, and secondly to make sure we send the input in the correct, intended way.
smartcore
has a good enum of errors that includes input
errors. I'll work on an update to see what you all think. I suspect making the inputs more robust was on the radar already.
Thanks for the clarification!
I'm submitting a
Current Behaviour:
When I call
I get an index out of bounds error.
Expected Behaviour:
It should compute the linear regression without throwing an error.
Steps to reproduce:
These are the input shapes
Snapshot:
Environment:
* rustc version: rustc 1.68.2 (9eb3afe9e 2023-03-27) * cargo version: cargo 1.68.2 (6feb7c9cf 2023-03-26) * OS details: MacOS 13.3 M1 ### Do you want to work on this issue? yes. I'm trying to debug it now. This said, I'm new to this lib and so want to make sure I'm not missing something.