sparsemat / sprs

sparse linear algebra library for rust
Apache License 2.0
416 stars 47 forks source link

Writing a Complex64-valued matrix in MatrixMarket format seems incompatible with scipy #301

Closed chetmurthy closed 2 years ago

chetmurthy commented 2 years ago

I wrote a small matrix out in MatrixMarket format using sprs::io::... and the format I got had values of the format "1+2i", whereas with scipy, I get two columns, one with real, the other with imag. I attach sample out from each writer.

%%MatrixMarket matrix coordinate complex general
% written by sprs
2 2 4
1 1 1+0i
1 2 0-2i
2 1 0+2i
2 2 1+0i

and from scipy:

%%MatrixMarket matrix coordinate complex hermitian
%
2 2 3
1 1 1.000000000000000e+00 0.000000000000000e+00
2 1 0.000000000000000e+00 2.000000000000000e+00
2 2 1.000000000000000e+00 0.000000000000000e+00

I am completely unfamiliar with this format, so maybe both should work? But I thought I should report it. I was able to copy your writer and modify it to produce the format scipy expects, but .... well, maybe there's a bug here?

mulimoen commented 2 years ago

There does indeed seem to be a bug here. Complex numbers should be emitted as a pair. To fix this we would need to print numbers without using the Display trait

chetmurthy commented 2 years ago

I printed thus:

        writeln!(writer, "{} {} {} {}", row.index() + 1, col.index() + 1, val.re, val.im)?;

If this format is acceptable, I can send you a PR. [I would have done so already, but wasn't even sure this was a bug.]

mulimoen commented 2 years ago

It would be great if you could make a PR for this!

chetmurthy commented 2 years ago

Looking at the reading code, it appears that it doesn't deal with complex values at all. Also, um, not that I have a right to complain (sprs is a lovely, lovely package after all) but the reading code is pretty complex. Maybe it might be worth thinking about how to rewrite it to be simpler (and at the same time, extend with support for complex numbers) ?

Obviously not for this PR though.

chetmurthy commented 2 years ago

I have another question, before I can start writing code: "what is the correct Rust-y way to implement the type-based printing?" I think I have a solution, and want to run it by you to see what you think. It feels a little contrived, but still, Rust-y. So:

  1. we know we cannot use fmt::Display to format the values in a matrix. The trait implementations for Complex<> are just wrong.
  2. We also know we cannot override those trait implementations.
  3. So we must define a new trait. But one cannot do so on an existing type from another crate.
  4. So we must define a new type, with a new trait. Call the type WrappedAgain<T>, and the trait DisplayForMediaMarket.
  5. Then we can do this:
    
    pub struct WrappedAgain<T> {
    pub it : T
    }
    pub trait DisplayForMediaMarket {
    fn mm_fmt(&self, f: &mut Formatter<'_>) -> fmt::Result;
    }

// string conversions impl DisplayForMediaMarket for WrappedAgain< Complex > { fn mmfmt(&self, f: &mut fmt::Formatter<'>) -> fmt::Result { write!(f, "{} {}", self.it.re, self.it.im) } }

and similarly for other types (`Complex<f32>`, `i32`, `i64`, etc).

6. But I don't want to then go and make a mess of `write_matrix_market`.  So I need to wrap this type in another one, on which the `fmt::Display` trait will be implemented.  Call this type `WrappedValue`:

pub struct WrappedValue { pub it : T } impl fmt::Display for WrappedValue where T: DisplayForMediaMarket, { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { self.it.mm_fmt(f) } }

7.  And now we're set: we can take a `Complex64`, wrap it twice, and then Display it, getting out the format we desire:
let n = Complex64::new(1.0, 2.0) ;
println!("{}", hello_cargo::WrappedValue { it : hello_cargo::WrappedAgain { it : n } }) ;
with output

1 2


8.  Actually, no, I'm not completely set.  I don't know how to express in the signature of the function, that we require on the type `N`, that the type `WrappedAgain<N>` have the trait `DisplayForMediaMarket`.

Everything else, I've tested.  But I don't know how to write this little bit.  I'll start reading more about generics in Rust, but if you know the answer (or a better way of doing all this), I'd appreciate your advice.
mulimoen commented 2 years ago

You can change the trait bound to take DisplayForMediaMarket instead of Display. This way you do not have to wrap the type

chetmurthy commented 2 years ago

Two thoughts:

  1. Oh hm, I thought one couldn't implement traits on a type outside of the crate where the type was defined. Could swear I got that error-message. OK, so this removed the need for the type WrappedAgain (the inner wrapper).
  2. But I still need the outer wrapper, in order to be able to retain the line
        writeln!(writer, "{} {} {}", row.index() + 1, col.index() + 1, val)?;

    yes? Where val is replaced with WrappedValue { it : val }, yes? So that the writeln! macro can invoke Display (fmt) on the value ?

mulimoen commented 2 years ago

I think such a trait could also include a read functionality to simplify read_matrix_market

mulimoen commented 2 years ago

You are allowed to implement your own trait for foreign types, but not a foreign trait for a foreign type (orphan rule).

writeln!(writer, "{} {} {}", row.index() + 1, col.index() + 1, val)?; could be writeln!(writer, "{} {} {}", row.index() + 1, col.index() + 1, val.mm_display())?; If the trait returns something printable

chetmurthy commented 2 years ago

The return-value of mm_display() would itself need to have the Display trait, right? And I'd want to avoid unnecessary copying, so maybe mm_display() should return a WrappedValue<T> (where T is Complex64, etc) ?

If, OTOH, mm_display() constructed a String, then we wouldn't need the extra type and stuff. But if we're going to be writing out billion-line MatrixMarket files (and my use-case is one where the matrix has 4B nonzero entries, sigh) then I'd think that we should avoid the overhead.

chetmurthy commented 2 years ago

I hope I don't appear to be inordinately disputative: I'm just learning, so I want to do this well, and that's why I'm asking these questions.

mulimoen commented 2 years ago

Had to play a bit around with how to implement this, I would do this as you did in https://github.com/sparsemat/sprs/issues/301#issuecomment-1161269752, but collapsed in just one helper struct

pub struct Displayable<T>(T);

pub trait MatrixMarketDisplay
where
    Self: Sized,
{
    fn mm_display(&self) -> Displayable<&Self> {
        Displayable(self)
    }
}

with the added trait bound for write_matrix_market

    N: 'a + PrimitiveKind + MatrixMarketDisplay,
    for<'n> Displayable<&'n N>: std::fmt::Display,

Then implement Displayable and MatrixMarketDisplay for f32, f64, Complex<..>

mulimoen commented 2 years ago

@chetmurthy Let me know if you want some more pointers on how to implement this

chetmurthy commented 2 years ago

First, I really do appreciate your advice and guidance on this. And I hope that I can contribute more than this tiny fix to your project, so that your investment will not be wasted.

I thought I would have to ask for your advice again, but then I had an idea, and that idea worked. So I'm proceeding.

chetmurthy commented 2 years ago

I have the fix working. I will write tests that read and write complex-value MM format files (from sprs/data). I'm still pretty new to pyo3, so I would prefer not to write tests that take MM format write-output and read it with scipy, but if you think that that would be a good thing to do, I can learn how: do you think this would be a good thing to do?

mulimoen commented 2 years ago

Using the example you've opened this issue with would be a good test. If both writing and reading Complex<f64> to/from the expected output works there is no point comparing with scipy(which is difficult and error-prone).

chetmurthy commented 2 years ago

Another question. The current design is that the data_type (from the MM header) is used to decide which code to run, and that code then uses a cast from the trait on the actual value type to convert to whatever it's supposed to be.

I've added a trait for parsing, so that's no longer necessary. Two issues:

  1. Many tests clearly assume that a real-valued MM file can be parsed into an int-valued CsMat. I'm going to leave that as-is, but if you have thoughts on what sort of compatibility should be supported, I'm happy to write code for it.

  2. if the (file-header) data_type is Complex, then the N::num_kind() had better be Complex, and vice versa. I've added code to check that and a test that will raise an error.

So right now (as it was before, I think) aside from Complex, if a file's values can be parsed into some type N, then they will be, and data_type will be ignored.