potocpav / npy-rs

NumPy file format (de-)serialization in Rust
30 stars 7 forks source link

Nested records #7

Open milibopp opened 6 years ago

milibopp commented 6 years ago

Numpy also has nested records. I wonder whether it makes sense to merge the NpyRecord and Serializable trait completely to support this feature. This would probably also clean up the crate's design.

potocpav commented 6 years ago

Taking your PR #8 into account, there is no reason I can see for not doing this - NpyRecord and Serializable are identical after merging DType with RecordDType. Can't believe I haven't picked up on this earlier.

milibopp commented 6 years ago

Oh, wait, this is not done yet and has to be re-opened. My pull request #9 merely restructured the API to make this possible, the feature itself has not been implemented yet.

I have not had time yet to follow up on that (may happen next month).

potocpav commented 6 years ago

Ah, my bad. It seemed easy enough to implement so I just did it (commit 87eefdee). Added some tests, hopefully it's all right.

Unfortunately, arrays of nested Records do not work since the shape information is not present in DType::Record: dtype: [('a', [('b', '>f4')], (2,))]

Perhaps we need to (again) rethink the DType struct to encompass everything?

milibopp commented 6 years ago

Ah, thanks for finishing up. I kind of stopped in the middle of it and got distracted by more urgent matters.

I did not think of that case, good that you point that out.

Probably it should look like this then:

struct DType {
    shape: Vec<u64>,
    inner_dtype: InnerDType
}

enum InnerDType {
    Plain(String),
    Record(Vec<Field>)
}

struct Field { /* like before */ }

Does that cover everything? I am not so sure, I like the name InnerDType here… but the structure should fit.