stdlib-js / google-summer-of-code

Google Summer of Code resources.
https://github.com/stdlib-js/stdlib
26 stars 7 forks source link

[RFC]: Add support for the multivariate normal distribution #70

Closed BrianP2002 closed 5 months ago

BrianP2002 commented 6 months ago

Full name

Lin ha

University status

Yes

University name

University of Wisconsin-Madison

University program

Computer Science, Mathematics, Data Science

Expected graduation

2025 Spring

Short biography

I am pursuing a bachelor's degree in mathematics, computer science, and data science at the University of Wisconsin-Madison. I am familiar with Java, C/C++, JavaScript, Python, R, and a little bit about HTML/CSS. My main interests in CS are arithmetic algorithms, cryptology, and optimization problems.

Timezone

US Central Time(UTC−06:00)

Contact details

email: halinbr2002@gmail.com, github: https://github.com/BrianP2002, phone: +1 6089773640

Platform

Linux

Editor

My first choice of code editor is VSCode, and my second choice is Vim. For VSCode, the main reason I like it is because it has an abundant and mature ecosystem for most languages and tools. It is easy to set up an eligible working environment in a short period with mainstream toolkits embedded, like git, docker, etc. Also, there are many beautiful themes I love (especially monokai). For Vim, I like it because it is really easy to call up and use. This lightweight editor saved me a lot of time when checking logs and outputs on the Linux VM.

Programming experience

Python: I use Python to do most of the Machine learning jobs like training NLP models through SpaCy, handling big data analysis through Hadoop-related software(Cassandra, Spark, Kafka), and writing some code to assist my math homework like checking if a matrix is totally unimodular and simulate Feistel cipher in CFB mode. Also, I am familiar with Django, I am currently working on a project about helping patients understand doctor's notes which takes Django as the backend. C/C++: I used to implement a simple shell using C which supports pipe, run commands in detached mode, and output redirection. I also implemented several other school projects including a simple automatic garbage memory collection, multi-thread merge sort, etc. Sometimes I will use C++ as a substitution for Python to implement some code to assist my math homework like implementing the simplex method using Tabular to solve LP problems. Java: I use Java to implement a personalized version of iperf to test internet connectivity and performance on the Mininet. I am also familiar with casting and data encapsulation, multithreading, network communication through java, etc.
JavaScript: I will state this part in detail in the next section. R: I learned R mostly from classes like data modeling. I am familiar with using R to plot various kinds of statistical diagrams, doing hypothesis tests, and evaluating regression models.

JavaScript experience

During high school, my partner and I constructed a game bot AI(generals.io) for a research project. I was in charge of data collection using a crawler to gain gaming replay data in JavaScript and did data transformation including slicing and replaying gamer's contest. For now, I am working on a project that developed a Chrome extension to help people read doctor's notes. I helped with the front end which let people directly select and highlight content they want to learn about using mainly javaScript cooperating with some Chrome-provided API. The one feature of JavaScript I liked most is its flexibility. As I stated above, it can be used in various environments and jobs. The reason behind this is there is a very mature ecosystem related to JavaScript, which makes it a very welcoming language. The thing I dislike the most is that JavaScript has a very blurry and loose typing system, which brought me a lot of trouble and confusion when I learned JavaScript. I prefer a more strict and explicit typing system rather than a blurry one.

Node.js experience

I am not very experienced in using Node.js, but I am familiar with the basic concepts and usage of it.

C/Fortran experience

I am experienced in C. C/C++ are the first two programming languages I learned. I took the Computer Organization and Operating System courses which all use C programming heavily. Thanks to these lectures, I am familiar with C programming's memory structure, multithreading, multiprocessing, data encapsulation, etc.

Interest in stdlib

As a student studying math, I really appreciate the purpose and goal of projects like stdlib. The existence of these libraries makes our life easier a lot. For instance, I don't need to handwrite all the code from scratch to simulate several random variables in various distributions. Therefore, I'd like to help develop and make this library better.

Version control

Yes

Contributions to stdlib

I've not yet contributed to stdlib, but I believe this is going to be a great time to start working on contributing something.

Goals

Basic Expectation: implement the multivariate normal distribution just like all other implemented, including but not limited to the following functions:

Bigger Picture Beyond the basic expectation, I will consider implementing several other multivariate distributions like multivariate hypergeometric/exponential/Bernoulli distribution.

Why this project?

I am deeply interested in contributing to this project, driven by my strong desire to apply my mathematical background to a math-related open-source library. With a foundation in both computer science and mathematics, especially in the realm of probability, I find this project to be a perfect match for my skills and interests. My academic and practical experiences have equipped me with a robust understanding of mathematical concepts and their computational implementations, making me keenly aware of the challenges and opportunities in developing mathematically rigorous and efficient algorithms. I am eager to contribute by leveraging my knowledge in probability and mathematical analysis. Joining this project represents a unique opportunity for me to merge my passion for mathematics with my computer science expertise, contributing to a library that is pivotal in advancing open-source, math-centric computing solutions.

Qualifications

I have taken college-level probability theory and stochastic processes courses at the university. I am also doing research directed by a professor in statistics, mostly about stochastic processes and probability distribution with multivariable. Also, I am familiar with tools like Wolfram Alpha, random distribution package in R, and writing personalized code (mostly in Python) for solving problems in Linear programming, cryptology, group theory, etc. Overall, I have a matching mathematical background and understanding of the demands of target users, which make me an eligible candidate for this project.

Prior art

Scipy has some decent implementation of multivariate normal distribution. R also had a package that implemented multivariate normal distribution. Julia also supports the multivariate normal distribution.

Commitment

I will finish all my final exams in mid-May, and I can work about 20-30 hrs/week for 12 weeks. I will be located in the US Central timezone and will quickly response to all the messages and video meetings.

Schedule

Assuming a 12 week schedule,

Notes:

Related issues

No response

Checklist

Planeshifter commented 6 months ago

Thanks for your proposal! To strengthen it, I would suggest to highlight your plans for integrating this new distribution with the existing stdlib codebase, especially since it relies on multi-dimensional arrays, which are not part of the native JavaScript language.

It's good that you reference prior art in other languages such as R and Julia, but it would also be beneficial to discuss more in-depth how the multivariate normal distribution will be implemented by you or is implemented in these reference implementations. For example, how will the covariance matrix be handled? What numerical methods will be used? This way, aside your highly relevant academic experience and achievements, you could further demonstrate your experience with numerical computing and assure the reviewers that you have the necessary skills to pull off this project.

kgryte commented 6 months ago

@BrianP2002 Following up on Philipp's comments, I'd also like to add

  1. As part of our application requirements, for any application to be considered, a contributor must land a patch to the main project repository. If this requirement is not fulfilled, we will not consider the respective application.
  2. In your timeline, you mentioned user feedback. How do you plan to acquire such feedback? Who is your target audience? My sense is that you are highly unlikely to get substantial feedback or idea a sufficient body of potential users, especially given JavaScript's current standing as a language for scientific computation. In which case, if you don't have a clear user feedback plan, I suggest expanding the technical activities of your proposal accordingly, potentially to include other multivariate distributions or higher level functionally which relies on the multivariate normal distribution.