Closed rxbryan closed 5 months ago
@rxbryan this looks good, thanks for the proposal! I am not so familiar with ndarray
APIs so no detailed suggestions, generic ones include explaining implementation of one of the APIs, the necessary files you may need to add, changes, etc. Link the [Idea] issue in Related issues
section. Rest all is good.
Thanks for the review @Pranavchiku.
Link the [Idea] issue in Related issues section. Rest all is good.
I don't think there's an issue related to this project. Should I create one?
@rxbryan thanks for opening this draft proposal!
I think overall the projects looks very good, I will ask you to add all the APIs in a list such that we can create in a future a tracking issue like this one. What I am seeing from this proposal is that you are not leaving room for review cycles, specially at the end of the project. It would be nice to leave room for these interactions and maybe if you are interested in writing a blog post for documenting your journey we can add it towards the final month of this proposal (please note that it is completely optional and it is up to you :) ).
Building on the previous comments, I have a few comments of my own:
push
, shift
, unshift
, pop
) are non-starters. ndarrays have fixed memory. So I am a bit curious why you included these in your proposal. Furthermore, how are these APIs suppose to work in a multi-dimensional context?every
, some
, etc) return a boolean. But this is a bit limiting, as we'd likely want to support operating over specific axes, resulting in an array of lower rank. Your proposal seems to propose flattening the input array. While useful in some contexts, this is limited. Furthermore, ndarrays APIs should not return scalars, but zero-dimensional ndarrays. Additionally, if you are planning to support dimensionality reduction, do note that we have yet to implement a bool
dtype. So how will you support?indexOf
, findIndex
, etc) return indices. We only support up to int32
/uint32
dtypes, but ndarrays are allowed to have more than 2**32-1 elements. How are you planning to support?fill
, you've specified integer indices. This likely does not make sense, as we're more likely to support slicing semantics. In fact, it is not clear why we'd want to support index arguments at all, given that views can be defined in userland.
- Any mutation APIs (e.g.,
push
,shift
,unshift
,pop
) are non-starters. ndarrays have fixed memory. So I am a bit curious why you included these in your proposal. Furthermore, how are these APIs suppose to work in a multi-dimensional context?
I assumed here that the user of this APIs (splice
, push
, shift
, unshift
, pop
) would be more interested in manipulating the dimensions of the ndarray (ie the shape of the ndarray). We could use slice to extract a specific "sub-ndarray" from the n-dimensional array, while using assign, slice-assign to copy to a new ndarray.
we could then assign a placeholder to the original elements we want to effectively shift
, or pop
. We could do the reverse for shift
and push
, adding a sub-array
- Various APIs (e.g.,
every
,some
, etc) return a boolean. But this is a bit limiting, as we'd likely want to support operating over specific axes, resulting in an array of lower rank. Your proposal seems to propose flattening the input array. While useful in some contexts, this is limited.
Yes the proposal assumes that the input array will be viewed as a flat single dimensional array for the includes
, every
, some
.
We only support up to int32/uint32 dtypes
my intention here was to implement the APIs within existing state of ndarray API which now seems a bit short sighted.
For things like fill, you've specified integer indices. This likely does not make sense, as we're more likely to support slicing semantics. In fact, it is not clear why we'd want to support index arguments at all, given that views can be defined in userland.
I would have to agree that the slicing semantics would be more appropriate. I think the proposal suffers alot from conforming to the JavaScript SPEC of the APIs
- In general, based on your API documentation, I am not sure you've actually studied our existing ndarray packages. Many of the API proposals seem LLM generated and not grounded in how we'd design APIs.
The API documentation was derived from https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array and the JSdoc from existing APIs in ndarray/base
not any LLM. It wasn't the final implementation of the APIs nor was any attempt made to follow the project design principles. It was just meant to visualize my idea of the APIs. I quite sorry if it looks plagarised!
I have to admit that is proposal was over enthusiastic. The attempt to cover the entire JavaScript built-in prototype methods have broadened the scope of this proposal than it should have been. Implementing this proposal as it is would require implementing this project idea https://github.com/stdlib-js/google-summer-of-code/issues/43 and Uint64 dtype support. With just one day to end of project submission I don't think this project can be updated reasonably. I'm willing to close the proposal if it doesn't quite suite the community's standards.
The project idea is not intended to match exact ECMAScript defined semantics for Array
and TypedArray
objects. Instead, the goal is provide "analogous" APIs which are conceptually similar to the methods on those objects. The current example is Array.prototype.slice
and @stdlib/ndarray/slice
. Those ideas are conceptually similar, but the latter is modified to match the semantics and constraints of multi-dimensional arrays. That should be similarly done for other proposed APIs. As a start, you could examine the APIs afforded by NumPy to better get a sense of multi-dimensional array design principles.
Full name
Bryan Elee Atonye
University status
Yes
University name
University of Port Harcourt
University program
Mathematics and Computer Science
Expected graduation
July
Short biography
My name is Bryan Elee. I am in my final year pursuing a degree in Mathematics and Computer Science. I recently completed my final exam and project defense, hence I'm awaiting graduation in the next couple months. I possess a strong foundation in various programming languages, with over 5 years of programming experience with C/C++, Python, and JavaScript, honed through academic studies, practical projects and internships. I have previously participated in the Google Summer of Code 2022 under the Metacall Organization and in the summer of Bitcoin program last year working under the Ledger Organization. This experiences solidified my ability to work effectively within open-source communities and collaborate with experienced developers. I'm interested in machine learning, especially the field of reinforcement learning. I did some work on reinforcement learning last year and I am very excited about the possibilities offered by this technology.
Specific Achievements:
Summer of Bitcoin: Developed "Resigner," a general-purpose hot signing service in Python for Ledger Organization. This project demonstrates my ability to tackle complex tasks (Miniscript language, cryptographic functionalities) and deliver real-world solutions. Project Link
Google Summer of Code: Refactored the Metacall core library to a plugin architecture. This experience showcases my proficiency in C/C++ development Project Documentation:
Timezone
UTC +1
Contact details
rxbryn@gmail.com
Platform
Linux
Editor
Sublime Text
Programming experience
I began programming before university in 2018. I started out writing shell scripts, moved on to C/C++, then the Python programming language, Javascript and NodeJS. I am self taught in the above languages, I was usually motivated by some project I was developing. I have worked on a couple projects but I am most proud of Resigner.
Resigner is an easy to program hot signing service for miniscript policies. The Resigner countersigns transactions (according to some rules (spending conditions), set in advance in the configuration file, for example “no more than 1 million satoshis per day” before the transaction is broadcast to the bitcoin network. It provides the following features:
It acts as a trusted third party in multiparty transactions enforcing previously agreed conditions
JavaScript experience
I have about 3 years of experience writing javascript programs. I have two published npm packages http-date and http-preconditions. I also have some experience doing backend web development using NodeJS, Express. I have contributed Javascript to a few open source projects such as
My favourite feature in javascript would be function prototypes. While this pattern has fallen out of favour being replaced by the class syntax, the prototype pattern provides an interesting approach for dynamic inheritance of object properties and behaviour.
My least favourite feature in Javascript is the event loop. While the event loop is responsible for the asynchronous behaviour in javascript, it is also makes writing true multithreaded javascript applications very difficult. Any attempt at optimising javascript code requires deep understanding of the nature of the event loop and how it affects the specific code being optimised. This experience is not readily available.
Node.js experience
My experience with NodeJS is quite extensive. I have some experience modifying NodeJS source code and compiling the Library for embedding purposes. Some of my experience developing node native addons and embedding NodeJS comes from contributing to the development of the node loader in metacall
core
. This draft PR contains a lot of my work in embedding nodejs. It was used as the base for implementing the feature for exporting classes and objects form nodejs to metacall. I also have some experience developing web applications using nodejs, express.js. I have also published some npm packages as I have elaborated on in the javascript sectionC/Fortran experience
The C programming language is the first language I learnt, the second being C++. It is the language that I have clocked the most years of experience. I used C extensively while paticipating in the summer of code 2022 under metacall and I also worked on some personal projects using C. Some of my contributions to open source projects using C include https://github.com/metacall/core/pull/289 https://github.com/metacall/core/pull/270 https://github.com/metacall/core/pull/287 https://github.com/metacall/core/pull/298 Some of these merged PRs include C++ code. But still demonstrates my the requisite skill
Interest in stdlib
My interest in Stdlib is twofold.
Version control
Yes
Contributions to stdlib
Merged contributions
refactor: update
blas/ext/base/sapxsumpw
to follow current project conventions refactor: updateblas/ext/base/scusumors
to follow current project conventions refactor: updateblas/ext/base/scusumpw
to follow current project conventions refactor: updateblas/ext/base/sapx
to follow current project conventionsGoals
The goal of this project is to achieve API parity for Stdlib native ndarray with built-in JavaScript Array. Of all the existing JavaScript array method only the at and slice methods exist in ndarray.
Each of the APIs is a standalone package in either the @stdlib/ndarray/base or @stdlib/ndarray directory
Each package would have this file structure
The following APIs will be implemented during the course of this project:
ndarray slice semantics for representing indices
APIs taking an Index or multiple indices will utilise the slice semantics. We shall use the slice API as it is, hence APIs such as
fill
,copywithin
,splice
etc shall take a slice object, array of slice objects or a multislice object.Dimensionality Reduction
In APIs which it would be suitable to support operating over specific axes, we will be utilising approach used by numpy. A
null
axis, (the default) is would perform the operation over all the dimensions of the input ndarray. If this is an array of ints, a reduction is performed on multiple axes, instead of a single axis or all the axes as before. For example, given a three dimentional ndarray, axis = 0 represent reducing along the depth. 1 represents represent reducing along the row and 2 represents represent reducing along the columnAccessors
ndarray APIs taking a callback such as unary implement optimised accessors for dimensions upto the 10d. We shall use this approach while implementing the APIs requiring callbacks
APIs
APIs that take a callback
Why this project?
Ndarrays are foundational to working with the stdlib library. They provide an efficient way to work with multi-dimensional numerical data. This project is a high priority for Stdlib for the fore-mentioned reason. It adds APIs that would be utilised in every package in the library. The Knowledge of working with multi-dimensional numerical data is a highly valuable skill for data science and machine learning, career paths I intend on pursuing. A significant portion of data science and machine learning involves working with numerical data, often organized in multi-dimensional structures like matrices and tensors. These structures represent complex relationships between features and observations. Understanding how to manipulate, analyze, and interpret this data is very important, this project hence affords me first hand experience with the ndarray object. I also stand to gain knowledge optimal techniques and patterns for iterating multidimensional arrays, possibly other optimisation techniques that might be used during the course the project.
Qualifications
I have completed the course work for a degree in Mathematics and Computer science. The relevant courses to this project would be Linear algebra, Numerical analysis, Data structures and algorithms.
I am also acquainted with the book
Algorithms
, 4th Edition by Robert Sedgewick and Kevin Wayne. It helped develop my understanding of both data structures and algorithms.I am also quite familiar with the emcascript specification. The definitions and implementations of the APIs will be informed by it
Prior art
The at and slice methods exist in ndarray. Various ndarray APIs have also being implemented. They will inform and guide our implementation of the project
Commitment
As stated in background section, I recently completed my final exam and project defense. Hence I'm free from any major commitments and will be able to give a ~40hr/week to this project
Schedule
Assuming a 12 week schedule,
Each of the APIs to be implemented is standalone, and will not be considered implemented without its benchmarks, tests, documentations and examples. So rather than having a week for documentation, tests and so on...I intended to submit PRs to atleast 3 APIs per week.
Week 1 - Week 3: start coding
findLast
,includes
,join
,reduceRight
,toreversed
,tosorted
,toSpliced
,values
Week 4 - Week 6: (midterm): implement
filter
,find
,forEach
,includes
,splice
,copywithin
,concat
,sort
,reverse
Week 7 - Week 9: Implement the remaining APIs ,
map
,reduce
,some
,join
,toString
,Week 10 - Week 12: because of the complex nature of the project, I intend to leave the last two weeks for review because I’m expecting a lot of reviews before we can get this code merged
Related issues
No response
Checklist
[RFC]:
and succinctly describes your proposal.