lessthanoptimal / ejml

A fast and easy to use linear algebra library written in Java for dense, sparse, real, and complex matrices.
https://ejml.org
565 stars 118 forks source link

Mask benchmark #131

Closed FlorentinD closed 3 years ago

FlorentinD commented 3 years ago

based on #119. Introduces benchmarks for the masked vxm and mxv.

FlorentinD commented 3 years ago

@lessthanoptimal What do you think of caching the generated random matrix and loading it from disk? For my personal benchmarks I realized that the random matrix generation takes most of the time and thus I could reduce the runtime by quite a bit.

lessthanoptimal commented 3 years ago

go for it!

lessthanoptimal commented 3 years ago

Any idea why Github doesn't seem to think the last PR is merged? I'm seeing changes for both. I'll check back in a little bit and see if it fixed itself.

FlorentinD commented 3 years ago

I needed to rebase the PR, now there should it should be fixed

FlorentinD commented 3 years ago

Just tried the caching approach, but there seems to be no measurable benefit. I assume the matrices are too small to make a difference. Also I saw you deprecated the binary loading anyway, and I highly doubt loading via csv is faster than generating the random matrix.

If its of any interest at some point, the branch is at https://github.com/FlorentinD/ejml/tree/cacheCSCMatrices.

lessthanoptimal commented 3 years ago

Could add a new binary file format if that really is a lot faster.

Its actually interesting for why its use is strongly advised against. Apparently it can be used to launch attacks like executing arbitrary code or something like that.

FlorentinD commented 3 years ago

As I said, for these smaller benchmarks it doesn't seem to be worth. In my benchmarks the matrices were a magnitude larger. So at the moment I don't think it's worth.

Didn't know that about loading binary objects , but sounds interesting.

lessthanoptimal commented 3 years ago

was thinking of the other benchmarks where you saw improvement

FlorentinD commented 3 years ago

If you plan to run some bigger benchmarks on sparse matrices, it is probably worth it

ennerf commented 3 years ago

@lessthanoptimal Fyi, I ran into a similar issue where I needed to load large sparse datasets for tests/benchmarks, and ended up implementing DMatrixSparseCSC serialization with MATLAB's .mat file format (Mat5EjmlTest). The binary storage format uses CSC as well, so it can be implemented very efficiently.

Probably not worth it for small datasets though.

lessthanoptimal commented 3 years ago

@ennerf Any chance you could donate that .mat serialization to EJML?

lessthanoptimal commented 3 years ago

actually looks like it's its own project! I'm wondering if we talked about this before. Well I'll provide a link to it from EJML's website.

ennerf commented 3 years ago

@lessthanoptimal we briefly talked about this during the work on sparse solvers a few years ago

I think it's probably better to keep it as a separate project for the time being. I'm not opposed to moving parts into EJML proper, but I'd prefer to first think about a better plugin mechanism to make the integration more seamless. Let me know in case that is something you'd potentially like to pursue.

Thanks for the link!

lessthanoptimal commented 3 years ago

@FlorentinD if you look on the front page you'll see the link as well on the now updated MatrxIO page.

How about adding MatrixIO.loadMatlab() then use reflection to call your function? If the class doesn't exist print an error message saying how the dependency can be added? The error would link to a website that says how to add the dependency.