scijs / ndarray

📈 Multidimensional arrays for JavaScript
MIT License
1.21k stars 61 forks source link

Masked arrays for typed arrays #22

Closed letmaik closed 9 years ago

letmaik commented 9 years ago

For some cases it would be nice to use masked arrays, similar to numpy (or.. better), so this is for typed arrays. I guess if you have undefined values in standard arrays, then ndarray leaves these alone, right?

I think this is a huge task but I still wanted to open an issue to see if anyway else is interested as well.

mikolalysenko commented 9 years ago

This should be quite easy to support using the general data type accessors (and it was part of my plan in designing those features to support cases like this).

If you implement a masked array storage that has the .get/.set/.length properties as required, then you should be able to implement a masked array on top of it.

letmaik commented 9 years ago

I see, so let's assume I use a special value as "nodata" in a typed array, and for each .get it would check this and either return the value or undefined. Do you think performance would suffer a lot here, e.g. for -ops operations? I imagine that ndarray applies optimizations for typed arrays (or just the fact that it uses them directly with [] syntax) and these would probably not be possible with this generic array storage.

mikolalysenko commented 9 years ago

ndarray has two modes. For typed arrays it uses the [] syntax for accessing elements, and for general storage it uses .get/.set. cwise implements the same features.

In terms of overhead, it might be a bit slower, but it should at least work with the ecosystem. For examples of how this works, look at some of these examples:

https://github.com/scijs/ndarray-hash https://github.com/scijs/ndarray-bit

letmaik commented 9 years ago

Ok, as far as I see there is no special handling for undefined/null. So if my masked array would return such value and you would do things like aggregations (sum, argmin, ...) then these would fail, right? Does that mean you have to implement special nodata-aware versions of all these operations? This is what I meant with huge task, as numpy has such special operations for masked arrays.

mikolalysenko commented 9 years ago

Ah, I see. I think you could modify ndarray-ops to ignore undefined values. I was thinking that you wanted to implement some data structure for just masking out components or skipping nans.

letmaik commented 9 years ago

Well, both :) I have typed arrays with nodata values that are encoded with numbers outside a specificied "valid range". So for ndarray I would have to implement a storage like you said which would be easy. But then of course it would good if the rest of the ndarray ecosystem would still work like expected (where it makes sense). However, adding checks for === undefined everywhere (at least for native arrays and generic storages) is probably not a good idea as you don't want that slowdown for non-masked ndarrays, right? So the alternative would be to provide special functions like numpys nanargmin etc.

mikolalysenko commented 9 years ago

Yeah, I think that would be the right way to go. It should be pretty easy to bootstrap some of these, you could just fork ndarray-ops and add the necessary features.

letmaik commented 9 years ago

OK, I'll close this since it is more or less easily doable via extensions/separate packages.