stdlib-js / google-summer-of-code

Google Summer of Code resources.
https://github.com/stdlib-js/stdlib
23 stars 5 forks source link

[RFC]: Implement a broader range of statistical distributions #68

Closed AhmedKhaled590 closed 2 months ago

AhmedKhaled590 commented 3 months ago

Full name

Ahmed Khaled Mahmoud

University status

No

University name

Graduated from Cairo University

University program

Computer Engineering

Expected graduation

JUL 2023

Short biography

I am a Software Engineer based in Egypt. I hold a Bachelor's degree in Computer Engineering from Cairo University, where I graduated with a GPA of 3.3 out of 4, receiving recognition for my excellent performance in the graduation project.

Throughout my career, I have gained extensive experience in various technical areas. I have worked as a Netsuite Technical Consultant at eDigits Consulting, where I progressed to the role of Senior Netsuite Technical Consultant. During this time, I led multiple projects and developed expertise in JavaScript, HTML, CSS, Node.js, Git, SuiteScript 2.0 and SuiteApps

Some of my notable projects include my graduation project DesignOder, a web-tool aimed at automating the conversion of wireframes into fully web pages using cutting-edge technologies such as computer vision, natural language processing, and machine learning. Additionally, I contributed to the development of CMPLR Website (one of the projects I worked on during faculty time), a ReactJS-based clone of a Tumblr website.

In my part-time role at Atom BI, I worked as a Junior Software Engineer, focusing on developing a business intelligence tool using GO language, JS, and PostgreSQL.

I have completed several certifications covering web development, backend development and databases to continually enhance my skills and stay updated with industry trends.

Timezone

Egypt (GMT +2)., some times the government in the summer decides it to be GMT+3

Contact details

email: kahmd1444@gmail.com, github:AhmedKhaled590

Platform

Linux

Editor

My preferred code editor is Visual Studio Code (VSCode). There are several reasons why I find VSCode to be the best fit for my workflow some of them that it offers excellent support for a wide range of programming languages and frameworks out of the box. Additionally, its vibrant ecosystem allows for easy integration of extensions tailored to specific needs.

Programming experience

Throughout my programming journey, I've had the opportunity to work on various projects that have allowed me to apply my skills and creativity. One notable project I've created is a web-tool called DesignOder.

DesignOder is a tool aimed at automating the conversion of wireframes into fully functioning web pages. The idea behind this project stemmed from the need to streamline the web development process and reduce manual effort in translating design concepts into code.

Here's an overview of the key features and components of DesignOder:

Computer Vision Modules: DesignOder incorporates computer vision techniques to analyze and interpret wireframe designs. This involves detecting various elements such as inputs, containers, images, and checkboxes within the wireframes.

Natural Language Processing (NLP) Module: To enhance user experience, DesignOder utilizes NLP algorithms to understand and interpret textual descriptions associated with different design elements. This helps in generating accurate HTML code based on the wireframe annotations.

Machine Learning Integration: Machine learning algorithms are employed to optimize the conversion process by learning from user interactions and feedback. This adaptive approach enables DesignOder to continuously improve its accuracy and efficiency over time.

Frontend and Backend Integration: DesignOder consists of both frontend and backend components. The frontend interface provides users with an intuitive platform for uploading wireframes, specifying design preferences, and reviewing the generated code. On the backend, sophisticated algorithms process the input data, perform the necessary transformations, and output the corresponding HTML markup.

JavaScript experience

In my experience with JavaScript, a significant portion of my work has been dedicated to developing solutions within the Oracle Netsuite environment using SuiteScript. SuiteScript is a JavaScript-based scripting language specifically designed for extending and customizing Netsuite, allowing developers to automate business processes, create customizations, and integrate with external systems.

One of the standout features of JavaScript that I particularly appreciate is its versatility. JavaScript is a multi-paradigm language, allowing me to write code using different programming styles, including procedural, functional, and object-oriented programming. This flexibility enables me to approach problem-solving in diverse ways, adapting to the specific requirements of each project.

Moreover, JavaScript's asynchronous programming model, facilitated by features like Promises and async/await syntax, is another aspect that I find immensely valuable. Asynchronous programming allows for non-blocking I/O operations, enabling the creation of responsive and efficient applications that can handle concurrent tasks without blocking the execution flow.

However, if I were to identify a least favorite feature of JavaScript, it would likely be its type coercion and loose equality comparison. JavaScript's automatic type conversion can sometimes lead to unexpected behaviors and errors, especially for developers coming from strongly typed languages. Additionally, the concept of loose equality comparison (==) can result in subtle bugs and inconsistencies, as it performs type coercion when comparing values, often leading to unintended outcomes.

Despite these challenges, JavaScript remains a powerful and widely-used language in the world of web development, offering an extensive ecosystem of libraries, frameworks, and tools that continue to evolve and enhance its capabilities. Overall, my experience with JavaScript has been positive, and I continue to appreciate its strengths while navigating its nuances and challenges.

Node.js experience

During my academic tenure, I worked on projects that provided hands-on experience in backend development using Node.js. One such project was the Blood Bank Website.

For the Blood Bank Website project, I was responsible for designing both the frontend and backend components. On the frontend, I utilized HTML, CSS, JavaScript, and Bootstrap 4 to create a visually appealing and user-friendly interface. This involved structuring the layout, styling elements, and incorporating interactive features to enhance user engagement.

On the backend, I employed Node.js as the server-side runtime environment to handle server logic and data management. Node.js enabled me to develop backend functionalities, such as user authentication, data validation, and API endpoints. Additionally, I utilized SQLite as the database management system to store and manage data related to blood donors, recipients, and inventory.

C/Fortran experience

While I may not possess professional experience in C/Fortran, I gained exposure to the C language during my academic studies. Specifically, I completed an operating systems course where the applied material focused on C programming. Additionally, I undertook a project in this subject area, which provided me with hands-on experience and a solid understanding of C programming concepts and practices. Though my expertise in C may not be as extensive as in other languages, I am confident in my ability to leverage my foundational knowledge to understand effectively projects requiring C programming skills.

Interest in stdlib

Although I haven't had the chance to utilize stdlib in my work, upon reviewing its capabilities briefly, I am impressed by its comprehensive nature, which encompasses a broad spectrum of functionalities.

Version control

Yes

Contributions to stdlib

Pull Request #1369: Added support for forEachRight method (RFC #5678) - Closed This PR proposed adding support for invoking a callback for each visual character of a string while iterating from right-to-left. It has been closed. Open Issue [RFC]: Time Series Generation Module #2092

Goals

Introduction

This proposal aims to enrich the statistical capabilities of the JavaScript Standard Library (stdlib) by implementing additional functionalities found in SciPy but currently missing in stdlib. By incorporating a broader range of statistical distributions, this project seeks to empower JavaScript developers with a comprehensive toolkit for conducting statistical computations directly within their JavaScript environments.

Statistical distributions are like the building blocks of data analysis. They help us understand how data is spread out and make decisions based on patterns. In the world of JavaScript programming, the stdlib library is like a treasure chest full of tools for working with these distributions. But, just like any tool, it can always be made better.

By leveraging modern JavaScript technologies and adhering to stdlib's standards for code structure, testing, and documentation, this endeavor aims to elevate stdlib's status as a go-to resource for statistical analysis in JavaScript. With this project, we're on a mission to make data analysis in JavaScript smoother, simpler, and more powerful than ever before.

Objectives

The primary goal is to assess the existing features available in SciPy and identify the critical functionalities that are currently absent in stdlib. The project will prioritize the implementation of APIs for distributions not yet covered in stdlib, addressing essential gaps in functionality. Each newly implemented function will adhere to stdlib's conventions for code structure, testing, and documentation. The specific objectives include:

  1. Conducting a comprehensive review of the features available in SciPy and identifying priority functions for implementation in stdlib.
  2. Developing APIs for additional statistical distributions, encompassing essential functionalities such as quantile computations and random variate generation.
  3. Crafting comprehensive documentation and usage examples in line with stdlib's established standards.
  4. Conducting rigorous testing to ensure the accuracy and reliability of the implemented functionalities.
  5. Incorporating feedback from the community to refine and enhance the implementations iteratively.

Technical Approach

To effectively deliver the proposed idea, the project will follow a structured technical approach that encompasses the implementation, testing, and documentation of additional statistical functionalities in stdlib. The approach will adhere to stdlib's established standards for code structure, testing, and documentation. Below are the key components of the technical approach:

1. Feature Prioritization and Selection:

Continuous distributions Exponpow: Exponential Power distribution Loglaplace: Log Laplace distribution Loggamma: Log Gamma distribution loguniform: Log Uniform distribution Maxwell: Maxwell distribution Pearson3: Pearson Type 3 distribution Fatiguelife: Fatigue Life distribution Genextreme: General Extreme Value distribution Rice: Rice Distribution distribution Foldnorm: Fold Normal distribution Foldcauchy: Fold Cauchy distribution Bradford: Bradford distribution halfcauchy: Half Cauchy distribution halflogistic: Half Logistic distribution halfnorm: Half Norm distribution invweibull: Inverse Weibull distribution Gompertz Gibrat dweibull

Discrete distributions

boltzmann Dlaplace planck zipf

Implementation:

Why this project?

This project excites me for several reasons, primarily because of its potential to significantly enhance the statistical capabilities of the JavaScript Standard Library (stdlib). Here's why I'm enthusiastic about this proposed endeavor:

Empowering JavaScript Developers: As JavaScript continues to evolve as a dominant language for web development, having robust statistical functionalities directly within the stdlib opens up new possibilities for JavaScript developers. It enables them to perform complex statistical analyses, data modeling, and simulations entirely within their familiar JavaScript environment, without the need for external dependencies.

Filling Gaps in Functionality: The proposed project aims to bridge the gap between stdlib and SciPy, a widely used library in Python for scientific computing. By implementing additional statistical distributions and functionalities found in SciPy but missing in stdlib, we can offer JavaScript developers a more comprehensive toolkit for handling diverse statistical tasks. This not only improves the usability of stdlib but also fosters consistency across different programming languages, making it easier for developers to transition between them.

Enhancing Accessibility: Access to a broader range of statistical distributions and functionalities democratizes statistical analysis, making it more accessible to a wider audience. Whether it's students learning about probability distributions, researchers conducting data analysis, or developers building statistical applications, having a rich set of tools readily available in stdlib lowers the barriers to entry and fosters innovation in various domains.

Community Impact: This project is not just about implementing code; it's also about building a vibrant community around statistical computing in JavaScript. By soliciting feedback, collaborating with contributors, and integrating community insights into the development process, we can create a library that truly meets the needs of its users. This collaborative approach fosters knowledge sharing, skill development, and community engagement, creating a positive impact beyond the codebase itself.

In summary, I'm excited about the proposed project because it aligns with the evolving needs of the JavaScript ecosystem, addresses existing gaps in functionality, promotes accessibility to statistical tools, and fosters community engagement. By contributing to stdlib's growth and maturity, we can empower JavaScript developers worldwide and facilitate the advancement of statistical computing in JavaScript.

Qualifications

With a strong background in JavaScript development, I bring a wealth of experience and skills to the table. Here's a summary of my qualifications:

JavaScript Expertise: Throughout my professional experience, I have worked extensively with JavaScript, utilizing it as the primary language for developing various applications and tools. Whether it's building web applications, creating automation scripts, or developing custom solutions, JavaScript has been at the core of my work.

Full Stack Development: As a Junior Software Engineer at Atom BI, I worked on developing a business intelligence tool using JavaScript as part of the stack. This role involved tasks such as working on the authentication process, connecting the tool with different database drivers, and ensuring seamless integration between frontend and backend components.

Web Development Experience: My involvement with projects like DesignOder and CMPLR Website has provided me with hands-on experience in web development using JavaScript frameworks like ReactJS. I have contributed to building frontend components, handling requests to servers, and integrating frontend with backend systems.

Technical Lead Role: At eDigits Consulting, I served as a Technical Lead for Netsuite implementation projects. While my focus was on providing technical expertise in Netsuite customization, I also leveraged my JavaScript skills to develop fully customized reports, create integrations between systems, and build NPM modules for template projects.

Soft Skills: Alongside my technical roles, I have also gained experience in project management, where I have led teams and managed projects from initiation to completion. This experience has honed my ability to plan, execute, and deliver projects effectively within specified timelines and budgets.

Educational Background: I hold a degree in Computer Engineering from Cairo University, where I graduated with a GPA of 3.3 out of 4. During my time at faculty, I studied two courses focused on statistics and various statistical methods. As part of these courses, I completed assignments that involved implementing statistical algorithms and methods. These experiences enhanced my understanding of statistical concepts and their practical applications.

Overall, my extensive experience in JavaScript development, coupled with my project management skills and educational background, makes me well-equipped to contribute effectively to projects requiring expertise in JavaScript and related technologies. I am excited about the opportunity to leverage my skills and make meaningful contributions to the proposed project.

Prior art

Several existing libraries and programming languages have implemented similar functionalities aimed at providing a comprehensive set of statistical distributions and computations. Here are some examples:

SciPy (Python): SciPy is a popular library for scientific computing in Python, offering extensive support for various statistical distributions, hypothesis tests, and related functionalities. It provides APIs for constructing distributions, computing probability density functions (PDFs), cumulative distribution functions (CDFs), and drawing random variates, among other operations. While SciPy serves as a robust solution for statistical analysis in Python, there is a need for similar capabilities within the JavaScript ecosystem.

R Language: R is a programming language specifically designed for statistical computing and graphics. It comes with built-in functions and packages for handling a wide range of statistical distributions, hypothesis tests, and data analysis tasks. R's extensive collection of statistical functionalities has made it a popular choice among statisticians and data scientists. However, the syntax and ecosystem of R may not be suitable for JavaScript developers looking to perform statistical computations within their JavaScript environments.

Math.js: Math.js is a comprehensive mathematics library for JavaScript that provides support for basic mathematical operations, linear algebra, and symbolic computation. While Math.js includes some basic statistical functionalities, such as computing mean, median, and standard deviation, it lacks support for a wide range of statistical distributions and hypothesis tests.

Java Apache Commons Math Library: The Apache Commons Math Library for Java offers a collection of mathematical and statistical algorithms. It provides APIs for various statistical distributions, including PDFs, CDFs, quantiles, and random variates generation. While this library serves the Java community well, there is a need for similar functionalities within the JavaScript ecosystem to enable statistical analysis directly within JavaScript applications.

While these existing solutions provide valuable insights and inspiration, the proposed project aims to fill the gap in the JavaScript ecosystem by implementing a comprehensive set of statistical functionalities in the JavaScript Standard Library (stdlib). By leveraging modern JavaScript technologies and adhering to stdlib's standards for code structure, testing, and documentation, this project seeks to provide JavaScript developers with a powerful toolkit for conducting statistical analysis directly within their JavaScript environments.

Commitment

I'm a full-time software engineer at eDigits Consulting, but I am fully committed to dedicating approximately 20-25 hours per week to the project during the Google Summer of Code program. While managing a full-time job alongside the GSoC program may seem challenging, I am confident in my ability to effectively balance my responsibilities and allocate sufficient time to make meaningful contributions to the project. During the proposal creation period, I was able to efficiently manage my time and complete the proposal while fulfilling my obligations at work. This experience serves as a testament to my ability to prioritize tasks, manage deadlines, and effectively allocate time for both work and personal projects. Besides that I don't plan for any vacations during this period. Overall, I am dedicated to the success of the project and am committed to investing the necessary time and effort to make meaningful contributions during the GSoC program

On Fridays and Saturdays, I can commit to 5:7 hours each day, distributed between 10 am to 5 pm or from 6am to 1 pm [I can dedicate to any of these two periods based on project communication needs]. From Sunday to Thursday, I can allocate 2:3 hours daily, either from 7 pm to 10 pm or from 6 am to 9 am.

Schedule

Community Bonding Period and Prepwork Review the existing statistical functionalities in stdlib for potential issues if any. Work on some of the straightforward distributions to familiarize myself with the codebase more such as loglaplace, loguniform, loggamma

Week 1-2 Work on implementing the APIs for drawing random variates for the implemented distributions in the Community Bonding period. Start Working on foldcauchy and half cauchy

Week 3-6: Implement APIs for remaining Statistical Distributions

Solve any issues in the previous implemented distributions. Design and implement APIs for selected statistical distributions, focusing on functionalities not yet available in stdlib. Deliverables: Implemented APIs for selected distributions, covering construction, random variates generation, and other properties.

Week 7-9: Testing and Validation Conduct rigorous testing of implemented functionalities to ensure accuracy and reliability. Deliverables: Comprehensive unit tests covering all aspects of the implemented functionalities. Validate the correctness of the implemented distributions against known reference values.

Week 10-11: Documentation and Usage Examples Document implemented functionalities and provide usage examples following stdlib's conventions. Deliverables: Detailed documentation for each implemented function, including parameter descriptions and usage examples.

Week 12: Final Testing, Optimization, and Wrap-Up Conduct final testing, optimize code for performance, and prepare for project submission. Deliverables: Final round of testing to ensure all functionalities are working as expected. Optimization of code for performance and efficiency where possible.

Related issues

No response

Checklist

Planeshifter commented 3 months ago

Thanks for your proposal to work on the statistical distributions!

While the objectives of your proposal are clearly defined, it would be beneficial to have more specific details about which statistical distributions are planned to be implemented. Doing some of this research before the start of a project would de-risk things and ensure that you could hit the ground running and make sure the scope of the project is properly determined.

It's good that you include code samples, which are clear and well-documented. But you also want to demonstrate that you have familiarized yourself with stdlib, including its coding conventions and existing statistical distributions. For example, we neither use Chai or Mocha for unit tests and it's thus detrimental for a proposal to refer to these instead of our test runner, tape.

A few points that you could address in the proposal are how do you plan to prioritize which statistical distributions to implement and your prior experience or familiarity with SciPy or any other statistical libraries.

Last but not least, it might be useful to include a brief discussion on how you plan to handle potential challenges or roadblocks that may arise during development, especially given your work commitments. In this context, it would be good to specify when exactly you plan to work on the project and during which times you would be available for syncing with mentors etc.

AhmedKhaled590 commented 3 months ago

@Planeshifter Thank you very much for your feedback🙏, I will work on these additions to the proposal and submit it ASAP.

kgryte commented 3 months ago

@AhmedKhaled590 A couple additional comments:

AhmedKhaled590 commented 3 months ago

@kgryte Thank you very much for your feedback 🙏, I will take them into my consideration while finalizing now the document to be submitted to GSOC.