stdlib-js / google-summer-of-code

Google Summer of Code resources.
https://github.com/stdlib-js/stdlib
23 stars 5 forks source link

[RFC]: achieve ndarray API parity with built-in JavaScript arrays #48

Closed headlessNode closed 2 months ago

headlessNode commented 3 months ago

Full name

Muhammad Haris

University status

Graduated

University name

Air University

University program

Mechanical Engineering

Expected graduation

Graduated 2022

Short biography

I am a Mechanical Engineering graduate and embarked on a journey of self-learning programming around June 2023. Since then, I have delved into various programming languages, including C++ and Assembly Language, while my primary focus has been on web development. I am Proficient in HTML, CSS, and JavaScript, and continuously expanding my skill set with the goal of learning React and back-end technologies in the future. Through personal projects, open source contributions and dedicated study, I aim to transition into a career in web development by the end of 2024.

Timezone

PKT (UTC + 5)

Contact details

email: harriskhan047@outlook.com, github: headlessNode

Platform

Linux

Editor

My preferred code editor is Visual Studio Code. I use it primarily because it's widely used and recommended in the programming community. While I haven't explored other options extensively, I find VSCode intuitive and feature-rich, which suits my needs for coding projects. However, I'm open to exploring other editors in the future if the need arises for a specific project.

Programming experience

I embarked on my programming journey with The Odin Project, an open-source platform tailored for self-learners interested in web development. Through this project-based learning program, I tackled various projects of differing complexities to gain practical experience.

Below are several projects that mark milestones in my learning journey:

These projects instilled in me the importance of writing clean, maintainable code and adhering to best practices in software development. Additionally, I've completed several other projects like to deepen my understanding of fundamental data structures like Trees, Hash-maps, and Linked Lists. For more projects, you can visit my GitHub profile.

JavaScript experience

In terms of JavaScript experience, I've gained practical knowledge through hands-on projects, as showcased above. I've completed projects that cover fundamental data structures like Hashmaps, LinkedLists, Binary Search Trees, and Graphs, which have deepened my understanding of JavaScript's versatility.

One of my favorite features of JavaScript is its asynchronous nature, which enhances application responsiveness. This feature is particularly powerful when handling tasks like fetching data or processing user input without freezing the UI. It provides a smoother user experience by executing tasks concurrently.

In terms of JavaScript's weaker points, one aspect that can be challenging is its reliance on callbacks. Callbacks can lead to famously dreaded, callback hell or deeply nested code structures, making code difficult to read and maintain. This issue not only affects code readability but also raises security concerns, as callbacks can give external entities control over the program's execution flow when fetching data from external APIs. However, with the introduction of Promises and Async/await syntax, this issue has been mitigated to some extent, providing cleaner and more manageable asynchronous code.

Node.js experience

I have some experience with Node.js primarily in the context of setting up build tools and development environments for front-end projects. I've used Node.js to install and configure libraries such as ESLint, Prettier, Webpack and more. Libraries which are essential for maintaining code quality and optimizing the development workflow. While my experience with Node.js is currently more focused on front-end development, I'm eager to expand my skills and explore its back-end capabilities as I progress in my learning journey.

C/Fortran experience

I don't have specific experience with C or Fortran but I do have some familiarity with C++. During my bachelor's degree, I utilized C++ for projects that involved solving differential equations. Additionally, I've primarily used C++ for tackling challenge questions on platforms like LeetCode. While my experience in C++ is more focused on algorithmic problem-solving, I'm open to exploring other programming languages and deepening my understanding of low-level programming concepts.

Interest in stdlib

What interests me about stdlib is the aspect of ndarrays. While I haven't yet had the opportunity to use stdlib for my projects, the concept of ndarrays immediately caught my attention. I have experience with simple 2-D arrays but I wasn't familiar with ndarrays until I saw this project achieve ndarray API parity with built-in JavaScript arrays on stdlib page. As I further researched this domain, I found the implementations of ndarrays and various methods associated with it really interesting, particularly the methods for efficiently iterating through the ndarrays of various shapes and strides.

Moreover, aside the technical aspects, I appreciate the welcoming and warm open-source community surrounding stdlib. I've had the pleasure of engaging in discussions about the project on the Element channel, and I found the community to be incredibly helpful and engaging.

Version control

Yes

Contributions to stdlib

Open pull requests:

Merged pull requests:

Goals

ndarrays (n-dimensional arrays), offer data structures that are crucial for many data processing tasks. The proposed project aims to enhance the functionality and usability of ndarrays, by achieving parity with built-in JavaScript arrays with focus on efficient ndarray operations for contiguous and non-contiguous data.

  1. Achieving Parity with Built-in JavaScript Arrays:

      1.1. forEach                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns an output array of the same shape.                          Reduction: Does not perform reduction; applies a function to each element.                          Suitability: Suitable for ndarrays; iterates over each element without changing the array's structure.       1.2. map:                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns an output array of the same shape.                          Reduction: Does not perform reduction; creates a new array with transformed elements.                          Suitability: Suitable; transforms each element into a new value, maintaining the array's structure.       1.3. reduce:                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns a reduced array (single value or array with fewer dimensions).                          Reduction: Performs reduction by applying a function to each element and accumulating the result.                          Suitability: Suitable for ndarrays; reduces the array to a single value or a lower-dimensional array.       1.4. reduceRight:                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns a reduced array (single value or array with fewer dimensions).                          Reduction: Performs reduction by applying a function to from right to left and accumulating the result.                          Suitability: Suitable for ndarrays.       1.5. filter                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns a flattened array containing all filtered elements.                          Reduction: Can be considered a form of reduction.                          Suitability: Suitable; selects elements based on a condition.       1.6. indexOf                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns an array of indices indicating the position of the element, -1 if not found.                          Reduction: Can be considered a form of reduction; returns an array of indices.                          Suitability: Suitable for ndarrays; allows searching for position of elements.       1.7. lastIndexOf                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns an array of indices, the position of the last occurrence, -1 if not found.                          Reduction: Can be considered a form of reduction; returns an array of indices.                          Suitability: Suitable for ndarrays; allows searching for elements from the end.       1.8. find                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns a single value (element satisfying the condition).                          Reduction: Does not perform reduction; returns the first matching element.                          Suitability: Suitable for ndarrays; finds the first element that satisfies a condition.       1.9. findIndex                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns an array of indices indicating position of element if found.                          Reduction: Can be considered a form of reduction.                          Suitability: Suitable for ndarrays.       1.10. findLast                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns a single value (element satisfying the condition).                          Reduction: Does not perform reduction; returns the last matching element.                          Suitability: Suitable for ndarrays.       1.11. findLastIndex                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns an array of indices of last occurrence indicating position of element if found.                          Reduction: Can be considered a form of reduction.                          Suitability: Suitable for ndarrays.       1.12. sort                          Mutation: Mutates the input ndarray by sorting the elements.                          Shape Consistency: Returns an array of the same shape.                          Reduction: Does not perform reduction; rearranges elements in sorted order.                          Suitability: Suitable for ndarrays; sorts elements along one or more dimensions.       1.13. toSorted                          Mutation: Mutates the input ndarray by sorting the elements.                          Shape Consistency: Returns an array of the same shape.                          Reduction: Does not perform reduction.                          Suitability: Suitable for ndarrays.       1.14. toReversed                          Mutation: Mutates the input ndarray.                          Shape Consistency: Returns an array of the same shape.                          Reduction: Does not perform reduction.                          Suitability: Suitable for ndarrays.       1.15. fill                          Mutation: Mutates the input ndarray by filling it with a specified value, overwriting existing elements.                          Shape Consistency: Does not change the shape of the array; the size and dimensions remain the same.                          Reduction: Not a reduction operation; it modifies the existing array by replacing its elements.                          Suitability: Suitable for ndarrays.       1.16. flat                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns a flattened array; changes the shape of the array to one dimension.                          Reduction: Not a reduction operation.                          Suitability: Suitable for ndarrays; simplifies processing or manipulation.       1.17. flatMap                          Mutation: Does not mutate the input ndarray.                          Shape Consistency: Returns a flattened array; changes the shape of the array to one dimension.                          Reduction: Not a reduction operation.                          Suitability: Suitable for ndarrays.

  1. Performance Optimization:
  1. Error Handling and Documentation:

Why this project?

Since I have only experience working with simple 2-D arrays, the opportunity to work on ndarrays within the stdlib framework really excites me. The prospect of contributing to a project that focuses on enhancing the functionality and efficiency of multidimensional arrays in JavaScript is highly appealing to me.

Firstly, the ndarray module plays a crucial role in numerical computing and data analysis tasks. By improving the ndarray API and optimizing its performance, the developers can work more efficiently with complex data structures with ease.

Secondly, I am intrigued by the technical challenges associated with achieving ndarray API parity with built-in JavaScript arrays. This involves implementing various methods and functionalities to ensure that ndarrays offer the same level of flexibility as native arrays while also optimizing their performance.

Furthermore, I am excited about contributing to an active and vibrant community of developers allowing me to learn from others, share my insights, and collaborate on solving problems together. The opportunity to make meaningful contributions to a project with real-world applications is both fulfilling and motivating.

Qualifications

As a self-taught programmer with a background in Mechanical Engineering, I have sharpened my skills through hands-on projects and self-directed learning. While I may not have formal qualifications in specific areas like statistics, I have a solid foundation in Numerical Methods and Differential Equations, which I acquired during my studies in Mechanical Engineering. Also my experience and approach to learning make me well-suited to execute on my proposal for the following reasons:

  1. Programming Skills:
  1. Self-Directed Learning:
  1. Adaptability and Resourcefulness:

I am committed to bridging any knowledge gaps through self-study and collaboration with domain experts. I am passionate about expanding my skill set and contributing meaningfully to projects.

Prior art

During my research to understand ndarrays and achieving ndarray API parity with built-in JavaScript methods, I came across NumPy's implementation of ndarrays. It offers an ndarray object along with a set of array manipulation functionalities like ndarray.reshape to reshape the array to a new shape, ndarray.flatten to flatten the array into a 1-D array, ndarray.sort to sort the elements of the array and many others.

While there isn't direct implementation of ndarray API parity with built-in JavaScript, exploring how these NumPy handles array manipulation, iteration, and performance optimization aided me in formulating a rough concept of how we can achieve our goals.

Commitment

Given my current status as a full-time learner in programming and web development, I don't have any other commitments, I can commit to full-time working hours (35 - 40 hours/week) on the project.

Schedule

Assuming a 12 week schedule,

Notes:

Related issues

Checklist

kgryte commented 3 months ago

A few comments:

  1. indexOf, includes, lastIndexOf, every, some, and find should be capable of operating over a list of specified dimensions. In which case, you won't return a single value, but an array of reduced rank.
  2. filter should return a flattened array, as otherwise you'd need to support ragged dimensions, which we cannot do.
  3. In order to support return an array of booleans, we need to add support for bool dtypes in ndarrays, which we do not currently support. Related: https://github.com/stdlib-js/google-summer-of-code/issues/43
  4. You've omitted various other built-in array methods (e.g., concat, copyWithin, fill, findIndex, findLast, findLastIndex, flat, flatMap, join, reduceRight, toReversed, toSorted, with). Can you comment on why you omitted those?
  5. I suggest updating your timeline with when you expect to open various PRs. Currently, the language is fairly vague (e.g., "begin experimenting", "prototyping", etc).
headlessNode commented 3 months ago

@kgryte Thank you for your detailed response.

  1. I've modified the indexOf. However, I think that some, every, find and includes will not return an array but a single boolean indicating whether the condition holds true or otherwise, Please guide me in this matter.

  2. I've modified the specifications for filter method.

  3. related to point 1.

  4. I didn't include concat, copyWithin, join, and with because I'm unsure of how they would be implemented for ndarrays. As for the remaining methods, I've now added them. The omission of those was due to their similarities to the ones already included, but I recognize their importance and have included them in the proposal. Apologies for the oversight, and thank you for bringing it to my attention.

  5. Timeline updated.

Again, thank you for providing the feedback. Your suggestions have been really helpful in coming up with the project proposal and refining it.

kgryte commented 3 months ago

Re: 1. You're assuming that the operation will be across the entire array. Meaning, if we're provided a 10D array or a 1D array, it is the same. You're proposal suggests that in both cases the result of, e.g., every will be the equivalent of flattening any input array to 1D and then checking each element.

What I am saying is that, while that is fine for a default, we're also likely to want to perform row-wise or column-wise operations. Consider the matrix

x = [
  0 1 0
  1 0 1
  0 0 0
]

We should be able to call every( x, axis ), where axis specifies the dimension over which to perform the reduction. E.g., if say, axis=0 means across rows, then the result should be

y = [
  true true true
]

and if, say, axis=1 means across columns, then the result should be

y = [
  true
  true
  false
]

See the difference? In the first case, y should be 1x3 and in the second case 3x1, assuming that keepdims is true, as in NumPy.

headlessNode commented 3 months ago

@kgryte Thank you for the detailed explanation. I could research how NumPy achieves the similar by the np.all, np.any, np.where etc methods. But then, as you said, we'd need add the support for bool dtypes. I've gone through the #43 issue, and it looks like it requires considerable time and effort.

I see two possible paths forward. The first option would involve incorporating the bool dtype support project into this one, potentially sacrificing some of the array API methods that are not directly related to this issue. However, this approach might affect the scope and timeline of the project.

Alternatively, we could focus on implementing methods that do not depend on returning an array of booleans for now. After GSoC, we could revisit the issue and work on adding support for bool dtypes, enabling us to implement methods that depend on them.

I lean towards this second option due to it being more focused and achievable project scope.

I'm open to any suggestions or alternative approaches you may have regarding this matter. So please let me know your thoughts.

kgryte commented 3 months ago

Pursuing the second option seems sensible.

headlessNode commented 3 months ago

Great, I'll update the proposal accordingly. Thank you for the guidance and support.

headlessNode commented 3 months ago

Hi @kgryte, before submitting the final proposal to the GSoC website, I wanted to confirm with you, if it would be appropriate for me to include what each package of the implementation will include?

For instance, for the implementation of forEach, we will have distinct modules for dimensions up to 10d. For each dimension, we'll include simple nested iteration, accessor, and blocked iteration implementations. And, for dimensions exceeding 10d, we'll include nd and nd_accessor implementations.

kgryte commented 3 months ago

@headlessNode Yes, that seems reasonable. Note, however, this structure may not be necessary for all the APIs in this proposal. For APIs needing element-wise iteration (e.g., forEach, map), this should suffice. For others, such as reduction APIs, what those implementations will look like is TBD.

headlessNode commented 3 months ago

@kgryte Thanks for the confirmation. And yes, in the final proposal, I already added an implementation overview for the forEach method. I explicitly mentioned that this implementation overview may not encompass each of the proposed methods.