sparsemat / sprs

sparse linear algebra library for rust
Apache License 2.0
386 stars 45 forks source link

Check if needs to be sorted #228

Closed mulimoen closed 3 years ago

mulimoen commented 3 years ago

The current behaviour is to always copy the indices and data to a new vector, sort it, then copy back. This PR instead checks if the vector/matrix needs sorting before doing this.

This PR does introduce a regression in the the csvec_additive_inverse and csvec_neg benchmarks, which is because of malloc being called more often for some reason. I think we are dealing with a (not so?) clever optimiser, and this should not be a regresson in code used elsewhere.

The additional commit adds the Copy clone to derive which should produce the same code, and adds the must_use attribute so one does not accidentally drop the result of a(n expensive) calculation.

vbarrielle commented 3 years ago

Checking before sorting looks like a good idea, but I'd like to see more generic benchmarks results, the ones you noticed probably show there are some missed opportunities in these methods, if the implementation is correct the structure of the result of an algorithm should be trusted, as here: https://github.com/vbarrielle/sprs/blob/master/src/sparse/smmp.rs#L436-L445

I'm skeptic about the #[must_use] attribute on CsMatBase, I fear it would make the API cumbersome, and I don't see in which situation one would forget to use a matrix that's needed. If such a matrix is unused, I'm pretty sure the optimizer is good enough to remove pure computations creating it.

mulimoen commented 3 years ago

My primary concern is the unnecessary data movement when a user creates an sprs datastructure, with no way to avoid this. Either we make this cheaper, as in here, or we open up for using unsafe to create the structures. I'll cook up some benchmarks to show the improvements.

mulimoen commented 3 years ago

I've reverted the must_use

vbarrielle commented 3 years ago

My primary concern is the unnecessary data movement when a user creates an sprs datastructure, with no way to avoid this. Either we make this cheaper, as in here, or we open up for using unsafe to create the structures. I'll cook up some benchmarks to show the improvements.

Yes this is a concern I completely understand. This PR looks like a good step to reduce unnecessary work, but having unsafe versions of constructor would be useful too.

mulimoen commented 3 years ago

Added a benchmark for sparse vector creation. A vector with 9000 non-zero elements will now take about 1402 ns/iter (+/- 51) vs 5750 ns/iter (+/-327) for the state before this PR.

A smaller amount of non-zeros (9) gives the following: 67 ns/iter (+/- 7) vs 144 ns/iter (+/- 6)

vbarrielle commented 3 years ago

That's a very nice improvement indeed! Everything looks good to me.

mulimoen commented 3 years ago

Thanks!