Similar to what's described in https://github.com/stdlib-js/google-summer-of-code/issues/43, a need exists to expand array data type support beyond numeric data types. One such data type is a string data type. The rationale for having a dedicated string data type is for better interoperation between JavaScript and C, and this is particularly paramount for supporting ndarrays having a string data type, as much of ndarray iteration machinery is written in C.
Accordingly, the goal of this project is to add a dedicated string typed array called a StringArray, which will support variable-length strings. This new array type should follow a similar path to that of @stdlib/array/complex64, which provides a typed array dedicated to single-precision complex floating-point numbers; namely, StringArray should support standard typed array methods, as well as provide accessors for getting and setting array elements.
Note, however, that a StringArray should be a typed array. A StringArray should not wrap a "generic" array. Instead, the array should be backed by fixed length memory, similar to how @stdlib/array/complex64 is backed by a Float32Array. One possibility is backing StringArray instances with Node.js Buffer objects, which are, in turn, Uint8Arrays.
There are, however, some design considerations; namely, how to handle setting of array elements. In particular, what happens when a user attempts to update a StringArray element with a larger string? Does that lead to a new memory allocation and data copy? Or should elements have a fixed allocation to allow for elements to grow until some maximum size?
As part of this project, not only will a new StringArray be added to the project, but it will be integrated throughout stdlib. This will entail adding support for StringArrays wherever arrays are accepted/used, following the same precedent established by @stdlib/array/complex64 and other custom array types in stdlib. This includes adding support for string arrays in ndarray APIs.
The expected outcomes of this idea should be (1) creation of a new @stdlib/array/string package exposing a new typed array constructor, (2) support for StringArray instances throughout @stdlib/array/*, (3) support for StringArray instances as backing arrays for ndarrays (which may involve working with various C APIs), and (4) any other integration opportunities.
Status
While no work has been done to create a new @stdlib/array/string package, there exists prior art for adding custom typed arrays to stdlib; namely, Complex64Array and Complex128Array.
Involved software
No special software for initial work. Once work has progressed to ndarray support, will need access to a C compiler, as documented in the project development guide.
Technology
JavaScript, C, nodejs, native addons
Other technology
n/a
Difficulty
Intermediate/Advanced
Difficulty justification
This project is ambitious, as there are many design considerations which need to be addressed in order to ensure performance and allow for efficient JS/C interoperation.
Additionally, there will be difficulty beyond the creation of a new StringArray class in finding all the various bits of code throughout the project which need to be updated in order to more universally support StringArray instances throughout stdlib on equal footing with other array data types.
Prerequisite knowledge
Familiarity and comfort with JavaScript would be highly recommended, given that this project will require considerable programming in JavaScript. Some familiarity with C would also be good, especially for string array integration with ndarrays.
Project length
350hrs, as will likely involve a decent amount of R&D.
Idea
Similar to what's described in https://github.com/stdlib-js/google-summer-of-code/issues/43, a need exists to expand array data type support beyond numeric data types. One such data type is a
string
data type. The rationale for having a dedicated string data type is for better interoperation between JavaScript and C, and this is particularly paramount for supporting ndarrays having a string data type, as much of ndarray iteration machinery is written in C.Accordingly, the goal of this project is to add a dedicated string typed array called a
StringArray
, which will support variable-length strings. This new array type should follow a similar path to that of @stdlib/array/complex64, which provides a typed array dedicated to single-precision complex floating-point numbers; namely,StringArray
should support standard typed array methods, as well as provide accessors for getting and setting array elements.Note, however, that a
StringArray
should be a typed array. AStringArray
should not wrap a "generic" array. Instead, the array should be backed by fixed length memory, similar to how @stdlib/array/complex64 is backed by aFloat32Array
. One possibility is backingStringArray
instances with Node.jsBuffer
objects, which are, in turn,Uint8Array
s.There are, however, some design considerations; namely, how to handle setting of array elements. In particular, what happens when a user attempts to update a
StringArray
element with a larger string? Does that lead to a new memory allocation and data copy? Or should elements have a fixed allocation to allow for elements to grow until some maximum size?As part of this project, not only will a new
StringArray
be added to the project, but it will be integrated throughout stdlib. This will entail adding support forStringArray
s wherever arrays are accepted/used, following the same precedent established by @stdlib/array/complex64 and other custom array types in stdlib. This includes adding support for string arrays in ndarray APIs.Prior Art
Expected outcomes
The expected outcomes of this idea should be (1) creation of a new
@stdlib/array/string
package exposing a new typed array constructor, (2) support forStringArray
instances throughout@stdlib/array/*
, (3) support forStringArray
instances as backing arrays for ndarrays (which may involve working with various C APIs), and (4) any other integration opportunities.Status
While no work has been done to create a new
@stdlib/array/string
package, there exists prior art for adding custom typed arrays to stdlib; namely,Complex64Array
andComplex128Array
.Involved software
No special software for initial work. Once work has progressed to ndarray support, will need access to a C compiler, as documented in the project development guide.
Technology
JavaScript, C, nodejs, native addons
Other technology
n/a
Difficulty
Intermediate/Advanced
Difficulty justification
This project is ambitious, as there are many design considerations which need to be addressed in order to ensure performance and allow for efficient JS/C interoperation.
Additionally, there will be difficulty beyond the creation of a new
StringArray
class in finding all the various bits of code throughout the project which need to be updated in order to more universally supportStringArray
instances throughout stdlib on equal footing with other array data types.Prerequisite knowledge
Familiarity and comfort with JavaScript would be highly recommended, given that this project will require considerable programming in JavaScript. Some familiarity with C would also be good, especially for string array integration with ndarrays.
Project length
350hrs, as will likely involve a decent amount of R&D.
Potential mentors
@kgryte @Planeshifter