Closed Rejoan-Sardar closed 7 months ago
Thank you for sharing a draft proposal @Rejoan-Sardar.
One question I have is whether in your study of SciPy implementations you've found any prerequisite functionality which is missing in stdlib but would need to be present in order to execute on your proposed tasks. If so, what is your plan for addressing and how will you accommodate potential delays (including review cycles) in your proposed timeline?
@kgryte, In my study of SciPy implementations, I've identified some functionality gaps in stdlib essential for implementing all distributions found in SciPy. Some distributions pose challenges due to dependencies on BLAS and LAPACK
functionality, such as matrix operations like transpose, dot product, cross product, determinant calculation, and many. To address this, I've analyzed each distribution's dependencies and also I will create a dependency graph For distributions relying on functionalities already present in stdlib, I've prioritized them in the simple and intermediate sections of my plan, ensuring completion before the midterm evaluation. Those dependent on BLAS and LAPACK
functions are some of included in the advanced distribution section. As the stdlib's math/base/special
and BLAS and LAPACK
project covers some of these functionalities, I'll leverage them initially. If additional functionalities are needed, I'll implement them after discussing with my mentor and adjust my timeline accordingly.
@kgryte any suggestions for improving this proposal before submitting it to the GSoC website? Additionally, I have a doubt: since I've already submitted a proposal in GSoC website, should I mention that this proposal will be my second preference?
Yes, you should specify which proposal is your highest preference.
Full name
Rejoan Sardar
University status
Yes
University name
Lovely Professional University, Punjab
University program
BTech. Computer Science and Engineering
Expected graduation
2026
Short biography
I am Rejoan Sardar, a 2nd year undergraduate Computer Science and Engineering student at Lovely Professional University. Throughout my programming journey, which began with Python during high school, I've continuously expanded my skill set and expertise. Python served as the cornerstone, providing me with a strong foundation that facilitated the exploration of other languages such as C/C++, Java, Javascript etc.I like coding for fun and have worked on various small projects which can be found on my Github Profile.
Timezone
IST (UTC + 5:30)
Contact details
email:rejoansardar4@gmail.com github:@Rejoan-Sardar
Platform
Windows
Editor
VS Code stands out as my favorite due to its seamless integration of powerful features, intuitive interface, and extensive customization options. Its robust set of tools, including IntelliSense for code completion and debugging capabilities, greatly enhance my productivity and streamline my development workflow.
Programming experience
I have been developing projects and participating in various competitions since my first year : a. NodeRepl- Online Code Editor Implementation in Node.js : Developers can write, edit, and execute Node.js code directly in-browser, collaborating with others in real-time via socket.io. Kubernetes ensures reliable container orchestration and scalability. Smooth handling of HTTP requests enhances user experience. Bounties incentivize community contributions and reward developers. b. Atom-REPL: Empowering JavaScript Development with Live REPL: Atom-REPL is a promising addition to the JavaScript coding ecosystem, offering a Live Read-Eval-Print-Loop (REPL) directly within the familiar environment of the Atom text editor. Inspired by the functionality of platforms like CoderPad, it aims to streamline the coding experience for JavaScript developers, particularly for quick exercises and experimentation. Still actively in development, Atom-REPL holds the potential for further enhancements and features, making it worth keeping an eye on for future updates and improvements. c. XKCDDisplay-JupyterLab Integration for Random XKCD Comics: Users can fetch and display random XKCD comics, navigate through them with metadata, and enjoy offline viewing. Customization options allow adjusting display settings, bookmarking favorites, and sharing via social media. Additionally, users can search for comics based on keywords, utilize keyboard shortcuts, and access accessibility features. Robust error handling ensures smooth operation, while integration with the JupyterLab ecosystem enhances overall user experience.
JavaScript experience
While contributing to stdlib-js development, I integrated JavaScript and C implementations for essential mathematical functions like boxcox1p, xlog1py, asinh, atanh, asind, acscd, odd and even etc. This involved enhancing the standard library for both JavaScript and Node.js environments. My contributions aimed to improve the efficiency and functionality of these mathematical operations within the library, benefiting users across different platforms. Despite its flexibility, JavaScript's dynamic typing can pose challenges without tools such as TypeScript to enforce type safety. While JSDoc aids in documentation, it lacks the capability to enforce strict typing. With my proficient skills, I ensure robust code quality by utilizing TypeScript to maintain type safety and enhance overall code reliability. I have worked on various small projects in Javascript and node.js which can be found on my Github Profile.
Node.js experience
Same as Javascript.
C/Fortran experience
My proficiency in C programming and problem-solving skills have been refined through my extensive engagement in competitive programming. I excel in crafting efficient algorithms and implementing them effectively in C to solve a variety of challenges. With a solid foundation in data structures and algorithms, coupled with my ability to optimize code for performance, I consistently deliver robust solutions in competitive programming contests. My track record showcases my capability to tackle complex problems, demonstrating my dedication to mastering C programming and problem-solving at a competitive level.
Interest in stdlib
I have been contributing to open source for quite a long time . I actively contributed to the Open Source community, notably within the stdlib organization. Here, I made significant contributions by refining C implementations of pivotal mathematical functions, aiming to enhance computational precision and efficiency. Through rigorous code review and optimization efforts, I demonstrated my commitment to collaborative development and my dedication to advancing computational mathematics' state-of-the-art. Engaging within the Open Source community provided invaluable insights and opportunities for learning and growth, reinforcing the importance of shared knowledge and collective advancement in the field of programming. My contributions involve creating PRs, helping other team members, creating bug issues present in the project, and also resolving them.
Version control
Yes
Contributions to stdlib
At present I have 22 PRs merged and 5 issues closed. Here are my some of the Open source contributions: All PRs which are merged
Status Opened
All Prs which are opened
Issues Opened All issues which are currently opened
Resolved Issue
Goals
This project aims to incorporate every distribution available in SciPy stats into a comprehensive JavaScript library. It involves developing APIs to calculate PDF, CDF, quantiles, and other essential distribution properties. Additionally, the library will provide functionality to generate random variates from any of the implemented distributions, ensuring a robust statistical toolkit within the JavaScript ecosystem. The default method used by SciPy to sample from any distribution requires integrating the PDF and then numerically inverting the CDF. The implementation in SciPy is too slow to be relied on for practical purposes and custom methods for sampling random variates need to be implemented for the distributions in SciPy. According to my thorough analysis, I've found that the distributions already implemented in the standard library (stdlib) exhibit a remarkable performance advantage over their default counterparts in SciPy. Through rigorous testing and benchmarking, it has been observed that the stdlib distributions are approximately 10,000 times faster in execution speed compared to the corresponding methods in SciPy. This significant performance gain could greatly benefit applications requiring high computational efficiency, making stdlib distributions an attractive choice for various scientific and engineering tasks.
Why this project?
This project holds immense significance due to its focus on enhancing the efficiency of random variate sampling methods from probability distributions. My analysis reveals that the default method employed by SciPy for sampling involves computationally intensive processes, such as integrating the Probability Density Function (PDF) and numerically inverting the Cumulative Distribution Function (CDF). However, this approach proves to be inefficient for practical applications due to its slow execution speed. To address this limitation, custom methods for sampling random variates from distributions in SciPy need to be developed.
Through extensive testing and benchmarking, I have discovered that distributions already implemented in the standard library (stdlib) offer a remarkable performance advantage over their counterparts in SciPy. Specifically, my findings indicate that stdlib distributions exhibit execution speeds approximately 10,000 times faster than those in SciPy. This substantial improvement in computational efficiency makes stdlib distributions an exceptionally appealing option for various scientific and engineering tasks where rapid computation is crucial.
Therefore, by optimizing random variate sampling methods through this project, we can significantly enhance the computational performance of scientific computations, thereby providing tangible benefits to a wide range of applications.
Implementation: Plan of action : In my study of SciPy implementations, I've identified some functionality gaps in stdlib essential for implementing all distributions found in SciPy. Some distributions pose challenges due to dependencies on
BLAS and LAPACK
functionality, such as matrix operations like transpose, dot product, cross product, determinant calculation, and many. To address this, I've analyzed each distribution's dependencies and also I will create a dependency graph For distributions relying on functionalities already present in stdlib, I've prioritized them in the simple and intermediate sections of my plan, ensuring completion before the midterm evaluation. Those dependent onBLAS and LAPACK
functions are some of included in the advanced distribution section. As the stdlib'smath/base/special
andBLAS and LAPACK
project covers some of these functionalities, I'll leverage them initially. If additional functionalities are needed, I'll implement them after discussing with my mentor and adjust my timeline accordingly. 2.3.1 Foundational Implementation-Simple Distributions and Random Variate Generation APIs: In the initial phase of our project, we prioritize the implementation of simple distributions, which serve as foundational components of statistical analysis. These distributions exhibit straightforward mathematical properties and are commonly used in various fields. By focusing on simple distributions first, we establish a solid groundwork for our library, ensuring accessibility and usability for a wide range of users. Through meticulous implementation and testing, we aim to deliver reliable and efficient functionality that lays the groundwork for more complex distributions in subsequent phases of development. This include:Log Gamma (Loggamma) (Log gamma continuous random variable)
2.3.2 Implementation of Intermediate Distributions and Random Variate Generation APIs: During the implementation of intermediate distributions, we focus on incorporating distributions with a moderate level of complexity and specialized applications. These distributions serve as essential tools in statistical analysis and modeling, providing insights into various real-world phenomena. Each distribution requires careful consideration of its mathematical properties and algorithms to ensure accurate and efficient implementation. Through this phase, we aim to expand the capabilities of our library to handle a broader range of statistical tasks and support more advanced data analysis techniques. This include:
2.3.3 Implementation of Advanced Distributions: Once the simple and intermediate distributions are implemented, we proceed to incorporate advanced distributions. This involves completing the implementation of all continuous and discrete distributions, along with selected multivariate distributions, to offer users a robust and versatile library for statistical analyses and modeling tasks.
Qualifications
Experience with Implementation: Demonstrated ability to implement statistical distributions such as Dgamma and Log Logistic, showcasing proficiency in translating mathematical concepts into functional code.(I will make my PR). Familiarity with the implementation process, including parameter estimation, PDF, CDF, mean, median, mode, quantile and random variate generation functions for each distribution. ensuring accurate representation of distributions in the library.
Knowledge of SciPy: When contributing to stdlib, I enhanced numerous distribution namespaces. To improve these contributions, I studied how similar functionalities are implemented in Scipy, gaining valuable insights and knowledge from the process.
Proficiency in C Implementation: For this project, we're building upon an already-established groundwork, as evidenced by the successful implementation of 20 functions. This achievement signifies a significant milestone, showcasing our ability to tackle complex tasks and deliver tangible results. Each implemented function represents a step forward in enhancing the project's capabilities, demonstrating our commitment to excellence and innovation.
Prior art
Commitment
I'm committed to dedicating 4 to 5 hours daily to the project, totaling 30-35 hours per week, with the flexibility to increase my hours as needed. With no conflicting obligations, I can adjust my availability to align with my mentor's time zone. My summer break spans from May 31 to July 31, allowing ample time for project completion. Expect regular progress updates and prompt communication for mentor assistance. Additionally, I'll maintain bi-weekly blog updates for reference.
Schedule
Assuming a 12 week schedule,
Up Till May 1:
Proposal accepted or rejected
Deliverables : SciPy's statistical capabilities and exploring JStat's diverse distribution implementations to enrich our understanding and optimize the statistical library.
Community Bonding Period:
Gompertz
,Fold Cauchy
,Half cauchy
,Half normal
,Half logistics
distributions,Week 1 - Week 3:
Coding officially begins
Bradford
,Argus
,Plank
,Rademacher
,Gibrat
,Boltzmann
,Pearson Type III
, andExponential
.Deliverables : Completed all simple distributions with PDF, CDF, and random variate generation functions, validated through thorough testing and supported by mentor guidance and feedback.
Week 4 - Week 6:
Maxwell
(Maxwell distribution),Zipf
(Zipf distribution),CrystalBall
(CrystalBall distribution),Von Mises
(Von Mises distribution).Deliverables : Prepared for midterm evaluation, having completed implementation of all distributions discussed in the simple and intermediate distributions sections with documentation.
Week 7 - Week 8:
fatiguelife
,foldcauchy
,foldnorm
,genlogistic
,gennorm
and all. Additionally, I will ensure that all continuous distributions present in Scipy.stats are fully implemented. Deliverables : Full implementation of all continuous distributions in Scipy.stats.Week 9 - Week 10:
Week 11 - Week 12:
Buffer Period
I will thoroughly address any backlog items that may have accumulated throughout the project timeline. This will involve carefully reviewing the project's progress with the guidance of mentors.
I will prioritize the creation of comprehensive documentation that encapsulates the entire project journey, including its objectives, methodologies, findings, and outcomes.
Finally, the project submission will be completed in the last week.
Post-GSOC :
Notes:
Related issues
2
Checklist
[RFC]:
and succinctly describes your proposal.