Open mpizenberg opened 6 years ago
I'm sure you've thought about this quite a bit, but the two type parameters on JsTypedArray
still seems a bit awkward...I'd be interested to see what the API would be like either going the fully-generic route or fully-specialized route. In the fully-generic route JsTypedArray
would have only a single type parameter (Float
or Int
) and the underlying data type would be hidden; that would mean things like equals
would have to handle arrays of different underlying types, but that seems doable (although a performance hit). In the fully-specialized route there would just be separate JsFloat64Array
, JsFloat32Array
, JsUint8Array
etc. each with their own set of operations (including ways to convert between them). A lot of duplication, but very explicit!
Why did you choose to add the Js
prefixes? If this is meant as a prototype for what typed array support in Elm would look like, then it shouldn't be explicitly tied to a particular compile target...better to figure out how typed arrays should behave in general, and just currently implement that using JavaScript's typed arrays.
For JsFloat64Array.decode
etc., does "throws an error" mean that your program will actually crash if the value being decoded is not a Float64Array
? Is there no way to catch this and just fail decoding instead?
Thanks for the feedback @ianmackenzie , these are quite interesting questions!
I'd be interested to see what the API would be like either going the fully-generic route or fully-specialized route.
Actually, we have discussed this a little bit in issue #1 with Francisco. He was more in favor of the fully-generic route I think. I went for the double type parameter for the type safety and to keep a visual clue about the type used. A function like map (\n -> n + 1)
will not return the same array if used on an Uint8Array
or Int8Array
for example. Also, in the generic approach, a function like map2 : (Int -> Int -> Int) -> TypedArray Int -> TypedArray Int -> TypedArray Int
could take two different typed arrays and wouldn't know which underlying type to use for the returned array.
The fully specialized route would mean that we get rid of the JsTypedArray
module.It could become an internal non-exposed module used with care by each specialized module. This would mean that there wouldn't be to much code duplication, but a lot more boilerplate, that's for sure!
I'm not sure how current fromTypedArray
functions could be defined though If a shared type does not exist. Currently for JsUint8Array
for example: JsUint8Array.fromTypedArray : JsTypedArray a b -> JsTypedArray Uint8 Int
.
Another issue with the fully specialized approach is the ability to extend the API. If someone wants to create a new function for their use case, they would have to write it for every type since they can't use the Native
trick.
Why did you choose to add the Js prefixes?
This is in reference to how Array
will work in 0.19. Array
is coded as an elm module relying on a thin wrapper of JavaScript Array
API called JsArray
: https://github.com/elm-lang/core/blob/master/src/Elm/JsArray.elm
I figured this module could have the same role regarding any data structure that would like to be based on typed arrays. For example, I'm trying to make an elm-tensor
module on top of this, that would be a type for multi-dimentional arrays for scientific computing, based on JsTypedArray
.
Due to performance requirements on operations with two matrices for my tests with elm-tensor
(additions, multiplications, ...), I added functions to work on two typed arrays, (map2
, foldl2
, ...) and "indexed" functions. Meaning elm-js-typed-array
is not as thin as I initially thought.
I'm genuinely interested in what other persons think of this too. Should it be kept as thin and equivalent to JS TypedArray API possible or more versatile?
For
JsFloat64Array.decode
etc., does "throws an error" mean that your program will actually crash if the value being decoded is not a Float64Array?
Yes, and I was wondering how to do it differently ... Current implementation is as follow:
decode : Decoder (JsTypedArray Float64 Float)
decode =
Decode.map fromValue Decode.value
fromValue : Decode.Value -> JsTypedArray Float64 Float
fromValue =
Native.JsFloat64Array.fromValue
And Native.JsFloat64Array.fromValue
is implemented like this:
function fromValue(value) {
if (!(value instanceof Float64Array)) {
throw "This is not a Float64Array";
}
return value;
}
I couldn't figure another way of decoding using the Json.Decode
API. Any better idea welcomed :)
I'm not sure how current fromTypedArray functions could be defined though If a shared type does not exist. Currently for JsUint8Array for example: JsUint8Array.fromTypedArray : JsTypedArray a b -> JsTypedArray Uint8 Int.
Yeah, in the fully-specialized case you'd have to just have a bunch of specialized conversions like JSUint8Array.fromJsInt8Array
. Verbose but explicit...
This is in reference to how Array will work in 0.19. Array is coded as an elm module relying on a thin wrapper of JavaScript Array API called JsArray: https://github.com/elm-lang/core/blob/master/src/Elm/JsArray.elm
I figured this module could have the same role regarding any data structure that would like to be based on typed arrays. For example, I'm trying to make an elm-tensor module on top of this, that would be a type for multi-dimentional arrays for scientific computing, based on JsTypedArray.
I think this probably affects a lot of other design decisions - if this is mean only as a base for higher-level modules, then following JavaScript closely certainly makes sense. Does that imply that this package would only be used from other Elm kernel code? (I assume JsArray
will not be visible to non-kernel code).
Also, in the generic approach, a function like map2 : (Int -> Int -> Int) -> TypedArray Int -> TypedArray Int -> TypedArray Int could take two different typed arrays and wouldn't know which underlying type to use for the returned array.
Good point - hadn't thought of that. Fully generic is probably not the way to go.
I couldn't figure another way of decoding using the Json.Decode API. Any better idea welcomed :)
Off the top of my head, how about having Native.JsFloat64Array.fromValue
return a Maybe Float64Array
instead of a Float64Array
? Should be relatively straightforward by constructing the JS object with __ctor__
field etc....then fromValue
could be something like
fromValue : Decode.Value -> JsTypedArray Float64 Float
fromValue =
Native.JsFloat64Array.fromValue
|> Decode.andThen
(\maybeArray ->
case maybeArray of
Just array ->
Decode.succeed array
Nothing ->
Decode.fail "Value was not a Float64Array"
)
Off the top of my head, how about having Native.JsFloat64Array.fromValue return a Maybe Float64Array instead of a Float64Array?
Ah good idea ^^. Corrected in d4802fd.
I think this probably affects a lot of other design decisions - if this is mean only as a base for higher-level modules, then following JavaScript closely certainly makes sense. Does that imply that this package would only be used from other Elm kernel code? (I assume JsArray will not be visible to non-kernel code).
Yes I also assume that JsArray
will stay private. In this discussion on elm-dev about NumElm, Evan seemed to be considering to open the module if needed though.
My intention is indeed to make this a thin wrapper around JS TypedArray. I'm pretty sure we are far from any integration into Core. So my purpose is to keep it easy to use for exploration for Web APIs requiring typed arrays.
Regarding performances, the main issue is the update of values of an array, that must create a new array to respect immutability. Taking this into account, the functions that cannot be removed are map
, reverse
, sort
. Functions replaceWithConstant
, reverseSort
and indexedMap
could be reimplemented on top but with a huge performance hit so I'd keep them anyway I think.
Providing that unsafeGetAt
is performant, transformation functions from two arrays (map2
and indexedMap2
) and reductions functions of two arrays (foldl2
, foldr2
, foldlr
) could actually be reimplemented with a small performance price I think from their indexed...
equivalent on one array. Cost would mainly be due to function calls. I have noticed in benchmarks the cost that can bring those function calls but I haven't specifically benchmarked the difference between a native implementation and another on top of an indexed...
one for these. I shall probably do it, to estimate real cost of removing those from the package.
Yeah, in the fully-specialized case you'd have to just have a bunch of specialized conversions like JSUint8Array.fromJsInt8Array. Verbose but explicit...
Actually, there could be an alternative that would be to introduce a JsTypedArray
type playing the role of an interchange data type. It would appear only in two functions in each module:
-- JsUint8Array module
toTypedArray : JsUint8Array -> JsTypedArray
fromTypedArray : JsTypedArray -> JsUint8Array
Considering that there would be 9 different types, 2*9=18
is a lot more manageable than 8*8=64
functions ^^. The toTypedArray
function would be an identity function hidden in a Native call.
The only two remaining drawbacks are the boilerplate (included documentation to maintain) and the difficulty to extend for users who'd like to write new functions (like map2
if it is removed from package).
For now I still prefer the version with two type parameters but I will try to prepare a summary of good/bad points when I'll prepare a post on discourse to ask for feedback more broadly, so don't hesitate to make other remarks on things that bother you, or that you like with one or the other approach.
It occurred to me that another thing to look at for inspiration is Python's buffer protocol, since it's been pretty successful at enabling stuff like NumPy: https://docs.python.org/3/c-api/buffer.html
I didn't know that python buffer API already had builtin multidimentional aspect (shape, strides, ...). I thought it was only a numpy overlay. The JavaScript ArrayBuffer
API is focus on linear representation, with an additional offset
and length
to enable slicing of buffers in constant time without copy.
I think the multidimentional aspect will only be useful for math/media-like manipulation, that's why I'm building elm-tensor on top of this. If it's not the case, I guess we shall see when trying to build those examples use cases #7 .
You have not asked for this, but i hope i can comment on the JsDataView
API. I think it would be better if the API follows more closely the JSON.Decoder
API. So instead of the get*
functions you have a JsDataView.Decoder a
type and the following functions
offset: Int -> Decoder a -> Decoder a
decodeInt32 : Decoder Int
decodeInt16 : Decode Int
decodeInt8 : Decode Int
decodeUInt8 : Decode Int
...
decodeFloat : Decoder Float
....
along with the standard map, map2, ..., andThen, fail, succeed, oneOf, list, ...
functions. With a function like decodeJsDataView: Decoder a -> JSDataView -> Result String a
you can run the decoder. This function should automatically check whether we are decoding something out of bounds.
You have not asked for this, but i hope i can comment on the JsDataView API.
Of course @LaurinStrelow all feedback is welcomed!
I think it would be better if the API follows more closely the JSON.Decoder API.
You are absolutely right on the fact that getters, with offset (and endianness that I didn't add yet for this first draft) don't fit very well with the other ways elm has to bring data from foreign environments i.e. decoders on JS values.
In issue #6, @ianmackenzie also proposed an alternative approach, the parser approach. Knowing that data is stored sequentially in a buffer array, and not organized like in JS objects with attributes, I think this is even better than the decoder approach. Using decoders (and getters!) you will always have to provide the offset, which can be pretty anoying for reading sequential data.
There are two reasons why I chose (for now) to go with getters.
Point 2 (minimalist native package) was derived from the successful approach of Skinney/elm-array-exploration which splits its array implementation in two parts. First a minimal wrapper of JavaScript array in native code (Native/JsArray.js
and Array/JsArray.elm
), and second a pure elm implementation on top of it (Array/Hamt.elm
).
My main concern about this separation is that the native code has to be minimalist, but also provide enough functionalities to compensate for the main issue of elm data structures, immutability (which is it's main advantage if we see the bigger picture). And I believe the only way to get it right is by trial and error. My ongoing experience with elm-tensor was exactly that. It's making me iterate a lot over TypedArray API, especially to provide creation or modification of arrays avoiding multiple passes.
The part of the DataView API that I couldn't figure out how to do in a minimalist way was the setters. In JavaScript, you can easily write on a buffer by modifying it. In elm, doing so using minimalist setters, in an immutable way, is a performance chasm. That's the reason why for now, I put setters in a MutableJsDataView
module.
The parser approach, while being IMO superior for reading sequential data from buffer don't offer any solution for the writing of data. The Decoder/Encoder approach however, provides a coherent approach for writing and reading data.
Anyway, at some point, you have to read the data using native code, calling JS getUint8 etc.
. I think you can just consider the corresponding elm functions as minimalist wrappers around those. Ideally, what would be very valuable would be if you could try to implement a decoder approach on top of this and report on the limitations you encountered.
Let me know what you think, thanks again!
Thank your for your answer.
Anyway, at some point, you have to read the data using native code, calling JS getUint8 etc.. I think you can just consider the corresponding elm functions as minimalist wrappers around those. Ideally, what would be very valuable would be if you could try to implement a decoder approach on top of this and report on the limitations you encountered.
I have not tried to implement this, but I think it should be possible to implement Decoders on top of the get*
functions. Somehow I think that in the best case the library should just expose one (familiar) way to decode the data. So the Decoder and the Parser approach should both be good but we should have just one of both. (With both approaches you should be able to rebuild the get*
functions with a line of code.)
Using decoders (and getters!) you will always have to provide the offset, which can be pretty anoying for reading sequential data.
This should depend on the data you are trying to decode. If your data is not sequential (and may require a lot of jumps to different positions), you may need a lot of extra afford to decode it. I am not familiar with the binary data formats so this might be no problem.
The part of the DataView API that I couldn't figure out how to do in a minimalist way was the setters. In JavaScript, you can easily write on a buffer by modifying it. In elm, doing so using minimalist setters, in an immutable way, is a performance chasm. That's the reason why for now, I put setters in a MutableJsDataView module.
I think you have a really performance critical use case in mind. The average elm developer should not bother an additional copy of their data. I think the most algorithms can be implemented with initialise
and map
in a performant way. However there are certain algorithms which cannot be implemented in the fastest way in Elm. To work around this, I think, it is the best to implement this specific algorithm as a native function. I understand if this defeats your purpose but the setter functions should really be avoided.
Currently, the API of this package is split in 4 modules:
JsArrayBuffer
for the underlying JavaScriptArrayBuffer
API.JsTypedArray
defines the typeJsTypedArray a b
and provides polymorphic functions operating on typed arrays.JsUint8Array
provides functions to create arrays of typeJsTypedArray Uint8 Int
JsFloat64Array
provides functions to create arrays of typeJsTypedArray Float64 Float
The purpose of this issue is to discuss this API and its structure. For now, only two creation modules are provided (
JsUint8Array
andJsFloat64Array
). If everything turns ok, I'll add the other creation modules:JsInt8Array
JsUint8ClampedArray
JsInt16Array
JsUint16Array
JsInt32Array
JsUint32Array
JsFloat32Array
The full current API documentation is available here: doc-2573a08.txt
As an overview of the API, here is a list of the functions provided.
JsUint8Array
andJsFloat64Array
modules:zeros
,repeat
,initialize
fromBuffer
,fromArray
,fromList
,fromTypedArray
decode
JsTypedArray
module:JsTypedArray a b
,Uint8
,Float64
toList
,toArray
,encode
length
,getAt
,unsafeGetAt
,buffer
,bufferOffset
all
,any
,findIndex
,filter
, and theirindexed...
equivalentequal
extract
,append
replaceWithConstant
,map
,map2
,reverse
,sort
,reverseSort
,indexedMap
,indexedMap2
join
,foldl
,foldr
,foldl2
,foldr2
,foldlr
, and theirindexed...
equivalent