Closed chetmurthy closed 2 years ago
There does indeed seem to be a bug here. Complex numbers should be emitted as a pair. To fix this we would need to print numbers without using the Display
trait
I printed thus:
writeln!(writer, "{} {} {} {}", row.index() + 1, col.index() + 1, val.re, val.im)?;
If this format is acceptable, I can send you a PR. [I would have done so already, but wasn't even sure this was a bug.]
It would be great if you could make a PR for this!
Looking at the reading code, it appears that it doesn't deal with complex values at all. Also, um, not that I have a right to complain (sprs is a lovely, lovely package after all) but the reading code is pretty complex. Maybe it might be worth thinking about how to rewrite it to be simpler (and at the same time, extend with support for complex numbers) ?
Obviously not for this PR though.
I have another question, before I can start writing code: "what is the correct Rust-y way to implement the type-based printing?" I think I have a solution, and want to run it by you to see what you think. It feels a little contrived, but still, Rust-y. So:
fmt::Display
to format the values in a matrix. The trait implementations for Complex<>
are just wrong.WrappedAgain<T>
, and the trait DisplayForMediaMarket
.
pub struct WrappedAgain<T> {
pub it : T
}
pub trait DisplayForMediaMarket {
fn mm_fmt(&self, f: &mut Formatter<'_>) -> fmt::Result;
}
// string conversions
impl DisplayForMediaMarket for WrappedAgain< Complex
and similarly for other types (`Complex<f32>`, `i32`, `i64`, etc).
6. But I don't want to then go and make a mess of `write_matrix_market`. So I need to wrap this type in another one, on which the `fmt::Display` trait will be implemented. Call this type `WrappedValue`:
pub struct WrappedValue
7. And now we're set: we can take a `Complex64`, wrap it twice, and then Display it, getting out the format we desire:
let n = Complex64::new(1.0, 2.0) ;
println!("{}", hello_cargo::WrappedValue { it : hello_cargo::WrappedAgain { it : n } }) ;
with output
1 2
8. Actually, no, I'm not completely set. I don't know how to express in the signature of the function, that we require on the type `N`, that the type `WrappedAgain<N>` have the trait `DisplayForMediaMarket`.
Everything else, I've tested. But I don't know how to write this little bit. I'll start reading more about generics in Rust, but if you know the answer (or a better way of doing all this), I'd appreciate your advice.
You can change the trait bound to take DisplayForMediaMarket
instead of Display
. This way you do not have to wrap the type
Two thoughts:
WrappedAgain
(the inner wrapper). writeln!(writer, "{} {} {}", row.index() + 1, col.index() + 1, val)?;
yes? Where val
is replaced with WrappedValue { it : val }
, yes? So that the writeln!
macro can invoke Display
(fmt
) on the value ?
I think such a trait could also include a read functionality to simplify read_matrix_market
You are allowed to implement your own trait for foreign types, but not a foreign trait for a foreign type (orphan rule).
writeln!(writer, "{} {} {}", row.index() + 1, col.index() + 1, val)?;
could be
writeln!(writer, "{} {} {}", row.index() + 1, col.index() + 1, val.mm_display())?;
If the trait returns something printable
The return-value of mm_display()
would itself need to have the Display
trait, right? And I'd want to avoid unnecessary copying, so maybe mm_display()
should return a WrappedValue<T>
(where T
is Complex64
, etc) ?
If, OTOH, mm_display()
constructed a String
, then we wouldn't need the extra type and stuff. But if we're going to be writing out billion-line MatrixMarket files (and my use-case is one where the matrix has 4B nonzero entries, sigh) then I'd think that we should avoid the overhead.
I hope I don't appear to be inordinately disputative: I'm just learning, so I want to do this well, and that's why I'm asking these questions.
Had to play a bit around with how to implement this, I would do this as you did in https://github.com/sparsemat/sprs/issues/301#issuecomment-1161269752, but collapsed in just one helper struct
pub struct Displayable<T>(T);
pub trait MatrixMarketDisplay
where
Self: Sized,
{
fn mm_display(&self) -> Displayable<&Self> {
Displayable(self)
}
}
with the added trait bound for write_matrix_market
N: 'a + PrimitiveKind + MatrixMarketDisplay,
for<'n> Displayable<&'n N>: std::fmt::Display,
Then implement Displayable
and MatrixMarketDisplay
for f32, f64, Complex<..>
@chetmurthy Let me know if you want some more pointers on how to implement this
First, I really do appreciate your advice and guidance on this. And I hope that I can contribute more than this tiny fix to your project, so that your investment will not be wasted.
I thought I would have to ask for your advice again, but then I had an idea, and that idea worked. So I'm proceeding.
I have the fix working. I will write tests that read and write complex-value MM format files (from sprs/data). I'm still pretty new to pyo3, so I would prefer not to write tests that take MM format write-output and read it with scipy, but if you think that that would be a good thing to do, I can learn how: do you think this would be a good thing to do?
Using the example you've opened this issue with would be a good test. If both writing and reading Complex<f64>
to/from the expected output works there is no point comparing with scipy
(which is difficult and error-prone).
Another question. The current design is that the data_type (from the MM header) is used to decide which code to run, and that code then uses a cast from the trait on the actual value type to convert to whatever it's supposed to be.
I've added a trait for parsing, so that's no longer necessary. Two issues:
Many tests clearly assume that a real-valued MM file can be parsed into an int-valued CsMat. I'm going to leave that as-is, but if you have thoughts on what sort of compatibility should be supported, I'm happy to write code for it.
if the (file-header) data_type is Complex, then the N::num_kind() had better be Complex, and vice versa. I've added code to check that and a test that will raise an error.
So right now (as it was before, I think) aside from Complex, if a file's values can be parsed into some type N, then they will be, and data_type will be ignored.
I wrote a small matrix out in MatrixMarket format using sprs::io::... and the format I got had values of the format "1+2i", whereas with scipy, I get two columns, one with real, the other with imag. I attach sample out from each writer.
and from scipy:
I am completely unfamiliar with this format, so maybe both should work? But I thought I should report it. I was able to copy your writer and modify it to produce the format scipy expects, but .... well, maybe there's a bug here?