kaleidicassociates / lubeck

High level linear algebra library for Dlang
http://lubeck.libmir.org/
Boost Software License 1.0
66 stars 14 forks source link

add meson build, circle ci, docs, and lubeck2 #32

Closed 9il closed 4 years ago

jmh530 commented 4 years ago

I have a few issues with the lubeck2.d file.

I noticed the lubeck2.d file has centerColumns and normalizeColumns functions. In my opinion, these should be agnostic by dimension, most obviously rows.

Looking at normalizeColumns, the standard deviation calculation assumes that the mean is zero. This is not mentioned anywhere in the documentation. It would be easy to miss that without looking at the documentation. This reminds me of the scale function in R. It has similar behavior as these two, but built into one function. The benefit is that the mean only needs to be calculated once.

I don't see a reason why DevianceEstimator does not include population for completeness. This is relevant if you extend this to skewness and kurtosis where the degrees of freedom calculations are less well known.

This functionality also relates to an earlier mir issue about sweeping arrays. I wrote up a sweep function toward the end of it that may have had more generic behavior than what is here (so it can apply any function by dimension). I haven't looked at it in a long time, but it may be worth a second look.

9il commented 4 years ago

Well, both versions of Lubeck are far from ideal. Furthermore, lubeck2 prefers to store vectors in columns which is common for math languages but is weird for C-like languages. I would write the whole project completely the other way. However, this isn't a priority for now. mir-optim and mir-integral use mir-blas and mir-lapack directly because of performance reasons. In the other PRs and redesign are welcome.

jmh530 commented 4 years ago

@9il What is the purpose of the addition of lubeck2? Is it meant to replace the original lubeck.d file?

I don't think it would be easy for me to change the change the storage order on this. I would probably just make changes along the lines that I had mentioned.

9il commented 4 years ago

@9il What is the purpose of the addition of lubeck2? Is it meant to replace the original lubeck.d file?

It is @nogc while the first one uses GC a lot. lubeck was my first attempt to write LA library, while lubeck2 was written by Thomas Webster and reviewed by me a bit.

jmh530 commented 4 years ago

It is @nogc while the first one uses GC a lot. lubeck was my first attempt to write LA library, while lubeck2 was written by Thomas Webster and reviewed by me a bit.

@9il Not sure you were aware, but below is easy to do with numir. I also have a function in there called deviationsPow that I used to build up some additional functionality. Since mean is now added to mir-algorithm, do you think I should move some of that functionality into mir-algorithm?

/+dub.sdl:
dependency "mir-algorithm" version="~>3.7.28"
dependency "numir" version="~>2.0.2"
+/

import mir.ndslice;
import numir : center;

void main() {
    auto x = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0].sliced(3, 2);
    auto y = x.center;
    auto z = x.byDim!0.map!(a => a.center);
}
9il commented 4 years ago

Sure, a new statistical API is welcome. It is very preferred to make all new code don't do allocations if possible. Many things like mean and moments can be computed using a single data lookup.