stdlib-js / google-summer-of-code

Google Summer of Code resources.
https://github.com/stdlib-js/stdlib
26 stars 7 forks source link

[RFC]: Implement a broader range of statistical distributions #77

Closed Rejoan-Sardar closed 7 months ago

Rejoan-Sardar commented 8 months ago

Full name

Rejoan Sardar

University status

Yes

University name

Lovely Professional University, Punjab

University program

BTech. Computer Science and Engineering

Expected graduation

2026

Short biography

I am Rejoan Sardar, a 2nd year undergraduate Computer Science and Engineering student at Lovely Professional University. Throughout my programming journey, which began with Python during high school, I've continuously expanded my skill set and expertise. Python served as the cornerstone, providing me with a strong foundation that facilitated the exploration of other languages such as C/C++, Java, Javascript etc.I like coding for fun and have worked on various small projects which can be found on my Github Profile.

Timezone

IST (UTC + 5:30)

Contact details

email:rejoansardar4@gmail.com github:@Rejoan-Sardar

Platform

Windows

Editor

VS Code stands out as my favorite due to its seamless integration of powerful features, intuitive interface, and extensive customization options. Its robust set of tools, including IntelliSense for code completion and debugging capabilities, greatly enhance my productivity and streamline my development workflow.

Programming experience

I have been developing projects and participating in various competitions since my first year : a. NodeRepl- Online Code Editor Implementation in Node.js : Developers can write, edit, and execute Node.js code directly in-browser, collaborating with others in real-time via socket.io. Kubernetes ensures reliable container orchestration and scalability. Smooth handling of HTTP requests enhances user experience. Bounties incentivize community contributions and reward developers. b. Atom-REPL: Empowering JavaScript Development with Live REPL: Atom-REPL is a promising addition to the JavaScript coding ecosystem, offering a Live Read-Eval-Print-Loop (REPL) directly within the familiar environment of the Atom text editor. Inspired by the functionality of platforms like CoderPad, it aims to streamline the coding experience for JavaScript developers, particularly for quick exercises and experimentation. Still actively in development, Atom-REPL holds the potential for further enhancements and features, making it worth keeping an eye on for future updates and improvements. c. XKCDDisplay-JupyterLab Integration for Random XKCD Comics: Users can fetch and display random XKCD comics, navigate through them with metadata, and enjoy offline viewing. Customization options allow adjusting display settings, bookmarking favorites, and sharing via social media. Additionally, users can search for comics based on keywords, utilize keyboard shortcuts, and access accessibility features. Robust error handling ensures smooth operation, while integration with the JupyterLab ecosystem enhances overall user experience.

JavaScript experience

While contributing to stdlib-js development, I integrated JavaScript and C implementations for essential mathematical functions like boxcox1p, xlog1py, asinh, atanh, asind, acscd, odd and even etc. This involved enhancing the standard library for both JavaScript and Node.js environments. My contributions aimed to improve the efficiency and functionality of these mathematical operations within the library, benefiting users across different platforms. Despite its flexibility, JavaScript's dynamic typing can pose challenges without tools such as TypeScript to enforce type safety. While JSDoc aids in documentation, it lacks the capability to enforce strict typing. With my proficient skills, I ensure robust code quality by utilizing TypeScript to maintain type safety and enhance overall code reliability. I have worked on various small projects in Javascript and node.js which can be found on my Github Profile.

Node.js experience

Same as Javascript.

C/Fortran experience

My proficiency in C programming and problem-solving skills have been refined through my extensive engagement in competitive programming. I excel in crafting efficient algorithms and implementing them effectively in C to solve a variety of challenges. With a solid foundation in data structures and algorithms, coupled with my ability to optimize code for performance, I consistently deliver robust solutions in competitive programming contests. My track record showcases my capability to tackle complex problems, demonstrating my dedication to mastering C programming and problem-solving at a competitive level.

Interest in stdlib

I have been contributing to open source for quite a long time . I actively contributed to the Open Source community, notably within the stdlib organization. Here, I made significant contributions by refining C implementations of pivotal mathematical functions, aiming to enhance computational precision and efficiency. Through rigorous code review and optimization efforts, I demonstrated my commitment to collaborative development and my dedication to advancing computational mathematics' state-of-the-art. Engaging within the Open Source community provided invaluable insights and opportunities for learning and growth, reinforcing the importance of shared knowledge and collective advancement in the field of programming. My contributions involve creating PRs, helping other team members, creating bug issues present in the project, and also resolving them.

Version control

Yes

Contributions to stdlib

At present I have 22 PRs merged and 5 issues closed. Here are my some of the Open source contributions: All PRs which are merged

Status Opened

All Prs which are opened

Issues Opened All issues which are currently opened

Resolved Issue

Goals

This project aims to incorporate every distribution available in SciPy stats into a comprehensive JavaScript library. It involves developing APIs to calculate PDF, CDF, quantiles, and other essential distribution properties. Additionally, the library will provide functionality to generate random variates from any of the implemented distributions, ensuring a robust statistical toolkit within the JavaScript ecosystem. The default method used by SciPy to sample from any distribution requires integrating the PDF and then numerically inverting the CDF. The implementation in SciPy is too slow to be relied on for practical purposes and custom methods for sampling random variates need to be implemented for the distributions in SciPy. According to my thorough analysis, I've found that the distributions already implemented in the standard library (stdlib) exhibit a remarkable performance advantage over their default counterparts in SciPy. Through rigorous testing and benchmarking, it has been observed that the stdlib distributions are approximately 10,000 times faster in execution speed compared to the corresponding methods in SciPy. This significant performance gain could greatly benefit applications requiring high computational efficiency, making stdlib distributions an attractive choice for various scientific and engineering tasks.

Why this project?

This project holds immense significance due to its focus on enhancing the efficiency of random variate sampling methods from probability distributions. My analysis reveals that the default method employed by SciPy for sampling involves computationally intensive processes, such as integrating the Probability Density Function (PDF) and numerically inverting the Cumulative Distribution Function (CDF). However, this approach proves to be inefficient for practical applications due to its slow execution speed. To address this limitation, custom methods for sampling random variates from distributions in SciPy need to be developed.

Through extensive testing and benchmarking, I have discovered that distributions already implemented in the standard library (stdlib) offer a remarkable performance advantage over their counterparts in SciPy. Specifically, my findings indicate that stdlib distributions exhibit execution speeds approximately 10,000 times faster than those in SciPy. This substantial improvement in computational efficiency makes stdlib distributions an exceptionally appealing option for various scientific and engineering tasks where rapid computation is crucial.

Therefore, by optimizing random variate sampling methods through this project, we can significantly enhance the computational performance of scientific computations, thereby providing tangible benefits to a wide range of applications.

Implementation: Plan of action : In my study of SciPy implementations, I've identified some functionality gaps in stdlib essential for implementing all distributions found in SciPy. Some distributions pose challenges due to dependencies on BLAS and LAPACK functionality, such as matrix operations like transpose, dot product, cross product, determinant calculation, and many. To address this, I've analyzed each distribution's dependencies and also I will create a dependency graph For distributions relying on functionalities already present in stdlib, I've prioritized them in the simple and intermediate sections of my plan, ensuring completion before the midterm evaluation. Those dependent on BLAS and LAPACK functions are some of included in the advanced distribution section. As the stdlib's math/base/special and BLAS and LAPACK project covers some of these functionalities, I'll leverage them initially. If additional functionalities are needed, I'll implement them after discussing with my mentor and adjust my timeline accordingly. 2.3.1 Foundational Implementation-Simple Distributions and Random Variate Generation APIs: In the initial phase of our project, we prioritize the implementation of simple distributions, which serve as foundational components of statistical analysis. These distributions exhibit straightforward mathematical properties and are commonly used in various fields. By focusing on simple distributions first, we establish a solid groundwork for our library, ensuring accessibility and usability for a wide range of users. Through meticulous implementation and testing, we aim to deliver reliable and efficient functionality that lays the groundwork for more complex distributions in subsequent phases of development. This include:

2.3.3 Implementation of Advanced Distributions: Once the simple and intermediate distributions are implemented, we proceed to incorporate advanced distributions. This involves completing the implementation of all continuous and discrete distributions, along with selected multivariate distributions, to offer users a robust and versatile library for statistical analyses and modeling tasks.

Qualifications

Prior art

Commitment

I'm committed to dedicating 4 to 5 hours daily to the project, totaling 30-35 hours per week, with the flexibility to increase my hours as needed. With no conflicting obligations, I can adjust my availability to align with my mentor's time zone. My summer break spans from May 31 to July 31, allowing ample time for project completion. Expect regular progress updates and prompt communication for mentor assistance. Additionally, I'll maintain bi-weekly blog updates for reference.

Schedule

Assuming a 12 week schedule,

Notes:

Related issues

2

Checklist

kgryte commented 8 months ago

Thank you for sharing a draft proposal @Rejoan-Sardar.

One question I have is whether in your study of SciPy implementations you've found any prerequisite functionality which is missing in stdlib but would need to be present in order to execute on your proposed tasks. If so, what is your plan for addressing and how will you accommodate potential delays (including review cycles) in your proposed timeline?

Rejoan-Sardar commented 8 months ago

@kgryte, In my study of SciPy implementations, I've identified some functionality gaps in stdlib essential for implementing all distributions found in SciPy. Some distributions pose challenges due to dependencies on BLAS and LAPACK functionality, such as matrix operations like transpose, dot product, cross product, determinant calculation, and many. To address this, I've analyzed each distribution's dependencies and also I will create a dependency graph For distributions relying on functionalities already present in stdlib, I've prioritized them in the simple and intermediate sections of my plan, ensuring completion before the midterm evaluation. Those dependent on BLAS and LAPACK functions are some of included in the advanced distribution section. As the stdlib's math/base/special and BLAS and LAPACK project covers some of these functionalities, I'll leverage them initially. If additional functionalities are needed, I'll implement them after discussing with my mentor and adjust my timeline accordingly.

Rejoan-Sardar commented 8 months ago

@kgryte any suggestions for improving this proposal before submitting it to the GSoC website? Additionally, I have a doubt: since I've already submitted a proposal in GSoC website, should I mention that this proposal will be my second preference?

kgryte commented 8 months ago

Yes, you should specify which proposal is your highest preference.