stdlib-js / google-summer-of-code

Google Summer of Code resources.
https://github.com/stdlib-js/stdlib
23 stars 5 forks source link

[RFC]: Develop a Google Sheets extension which exposes stdlib functionality #57

Closed adityacodes30 closed 2 months ago

adityacodes30 commented 3 months ago

Full name

Aditya Sapra

University status

Yes

University name

Thapar Institute Of Engineering and Technology

University program

Computer Engineering

Expected graduation

2026

Short biography

Hi , I am Aditya Sapra and I am currently in my 4th semester pursuing Computer Engineering at Thapar Institute of Engineering and Technology. Additionally I have the CS50 certificate by Harvard University. I have been passionate about the computer sciences and have been actively developing projects. While my primary language of choice is Javascript , i do have experience in working with C, C++ and python. I am very interested in backend systems, computer networking, Cloud and operating systems.

I do have hands on experience with technologies such as the MongoDB, PostgreSQL, React, Node, Express, Message queues such as RabbitMQ, Apache Kafka and deployment technologies such as Docker, Docker-compose and Kubernetes. Apart from that i do have keen interest in statistics and machine learning.

Formally my coursework up till now includes Computer Programming , Object Oriented Programming , Computer Networks, Operating Systems, DBMS, Data Structures, Design and Analysis of Algorithms along with 2 undergraduate level Mathematics subjects

Timezone

India Standard Time GMT+5:30

Contact details

email:adityaework@gmail.com,linkedin:https://www.linkedin.com/in/aditya-sapra-a70475252/,github:adityacodes30

Platform

Mac

Editor

I use VSCode as my daily driver due to its rich extension support and adoption. I use nano when i'm operating on remote Linux VM's

Programming experience

Creating things through code that materializes into real-world usage has always been a motivating factor in my programming journey. I have been programming for ~2 years where I've created some projects that I am truly proud of. I have working experience with Javascript, C, C++ and Python. I have made projects encompassing a range of different technologies and domains. I've listed some notable ones below

Catalog Scoring for Open Network for Digital Commerce

We built this project as a team as part of a national-level hackathon. It was developed for ONDC workflows, which use callback APIs with a stateless scalable micro-services architecture. This project scores a commercial catalog on demand with its proprietary ML algorithm. I was responsible for leading the team, implementing the backend APIs, RabbitMQ message queue as well as designing and implementing the architecture through the DOCKERFILE(s) and yml files. I also implemented the Kubernetes cluster to ensure scalability and availability and tweaked the ML algorithm for higher accuracy

Deepword

This project detects deepfakes across the web for a target personality through the application of verbatim matching through NLP. It was made as part of a hackathon where we won the runner-up. My primary responsibility was developing the APIS and deploying the project to the cloud by running Selenium and chrome in headless mode on a Debian VM.

Food festival website

Implemented the home page and its scroll-based animations. Designed and developed the sponsor's page as well. Made with React and Framer motion

Agency Website

A website for a marketing-content agency. Made with NextJS

AppscriptDB

This is a project I developed with google apps-script and google sheets to help people skip the hassle of setting up a database and use normal sheets as one to focus on idea implementations in hackathons

Other Projects [ Linktree ]

JavaScript experience

The first language i learned was Javascript. The majority of my projects and programming experience has been in Javascript and it is my go to language for building stuff. I have used used JS to build an array of things all the way ranging from backend apis to creating complex frontend animations. More recently , my contributions to stdlib have have helped me gain a deeper understanding as well. I like javascript because of its intuitive syntax, cross platform flexibility, huge ecosystem but most of all the unparalleled community support . Its easy to build and deploy stuff that has the potential to be on the forefront of the impact that technology makes. That said i would like to see javascript adopt some of the typescript fundamentals such as data types natively.

Node.js experience

I have used nodeJS almost exclusively to create the backend apis across my projects. I have a working experience with node coupled with libraries such as express , cors, jsonwebtokens and more. I have handled many core backend functionalities with node such as callback apis, server side code, file handling, database connections etc.

C/Fortran experience

I learned C during my coursework across two semesters as well as during the assignments in CS50 where I implemented a variety of algorithms and programs such as greedy algo, extracting images from a wiped memory card. I also have a broad overview of memory management in C. I also have some experience working with makefiles also have

Interest in stdlib

When i first came across stdlib, i was beyond delighted. Since by origin, JS was a browser-centric language a standard library was never thought of. But as the language and the world evolved the need was there. Stdlib seems like the answer.

What excites me about the project is the accessibility it provides to the general population. Often times people might not have the technical know-how or the resources to run statistical functions via code. I think stdlib has the potential to solve that issue by running in the browser itself. The potential is endless. I also see it being standardized for education pedagogies due to the ease it provides. I think stdlib along with the tools built on top of it has the potential to make a big impact on the educational field. standardized

Version control

Yes

Contributions to stdlib

Merged PRs

feat: add string/base/last-grapheme-cluster

feat: add string/base/last-code-point

feat: add string/base/last

feat: add array/base/join

Implements is-positive finite

feat: add assert/is-same-date-object

-at time of submission you can view All prs here

Code review - 1413

Issues

[RFC]: Add @stdlib/string/last

Goals

Goals

By the end of this project I plan to fully implement the G-Sheets project which enables people to use stdlib and all its related functionality. Although the work has already started on this, a lot has to be done until reaches a POC phase. I have tried my best to go through the stdlib and research what functions would be the most in demand and suitable for exposing in Google Sheets

I will be dividing the project into phases spread across 12 weeks

Phase 1 - Getting the base packages ready ! 235-260h

According to the TODO here , a total of 11 namespaces need to be implemented spanning ~800 packages of which 68 are implemented. 729+ packages need to be implemented. I intend to commit ~ 30-40 minutes average per package including the following steps

A summary of the packages is given below-

Name Total Implemented To be implemented
array 10 4 6
assert 79 0 79
datasets (as applicable) 24 7 17
math + math/base/ops +math/base/special 2 +24 +271 1 1 +24 + 271
nlp 5 0 5
number 20 0 20
random 53 36 17
simulate 14 1 13
stats + stats/base/dists 31 + 26 3 28 +26
string 52 16 36
utils 20 0 20
blas/base + blas/ext/base 28 + 128 10 146
complex 20 0 20
Total 797 68 729

Note: This is flexible depending on further discussion and scope of the project Additional tests with GAPTS

Phase 2 - Implementing 2D array semantics 35h

When working with arrays, arrays of different shapes can be combined under certain rules. Existing + custom wrapping logic will have to be used to perform these operations based on discussions and scope set in the three week bonding period. stdlib/ndarray will be particularly helpful in implementing these.

Phase 3 - Implementing performant fused operations 35h

As all functions are executed as RPC's ( Remote Procedure Calls ). It is needed that multiple operations be fused into one to get them out in a single server call to reduce number of network requests as well as reduce latency. A wrapper function will be explored to chain multiple function calls per call. A number of standard frequently used fused operations can be included as well. There will also be work on performant element wise iteration apis. To handle volume we need to create APIs that can iterate over the spreadsheet data in an efficient manner, possibly by chunking the data and processing it in batches. Operation fusion will likely need the development of a Computer Algebraic System. because Google sheets will treat data as strings and we will need to parse it and then call stdlib APIs .

Phase 4 - Documentation and tutorials 35h

This phase rounds up the fore said work with tying everything together via documentation. While the individual READMEs will be created as well at time of adding the packages this will add higher level documentation for namespaces, packages among other things. Tutorials on the usage of api functions would be set up as well via a subpage on the main stdlib website, if time permits with video snippets. This completes the core part of the execution

Phase 5 - Adding side panel in sheets for users 25h

This feature will allow users to interact with a side panel in the sheets app itself to search functions on the fly and view relevant description and related tutorials. This will add a level of interactivity that will make the usage extremely beginner friendly.

Phase 6 - Streamlining Build processes 15h

After the initial proof of concept has been made, but as the nature of open source is we need to regularly push updates. Therefore a CI/CD pipeline will be setup to update the deployments. This can also be done before week one to ease development. For now it is reserved for last

Note: The phases are not necessarily sequential, as anything else in development the steps are cross sectional

Why this project?

Stdlib has excited me due to its usage of JavaScript to run in browser environment, which opens the door for accessibility where enough computer resources might not be present. But to leverage and harness the power we need to build solutions on top of it so it can benefit an end user. This project does exactly that. I think implementation of this project will

Being a user facing project it fills me with enthusiasm that my work might impact thousands of people. It is also a way for me to accelerate my learning journey as a quality SWE.

Qualifications

I have a working proficiency in Javascript and NodeJs. I also have experience in backend technologies due to which I am extremely familiar with networking, apis and code optimisation along with other things. I have worked and have familiarity with google workspace and appscript which plays a pivotal role in this project.

I have been also working on this project with the issue #3 which has enabled me to get a deeper understanding of the existing underlying processes and repository structure. Working on generating githooks and its related makefiles has allowed me to have a understanding of underlying code and processes. I have added 6+ packages in the main repo which demonstrates my understanding of the codebase, general practices and have constantly learnt from feedback to generate quality PR's. More recently i have picked up the book Introduction to Algorithms to further my algorithmic skills and write efficient code.

Prior art

This project has already started at gsheets .

However i found that a repo has been implemented that brings Lodash to Gsheets - lodashgs. This uses the main lodash (a js utility lib) directory as a submodule

For array broadcasting I found the numpy's implementation interesting.

We intend to use a parser , Math.js has one

Commitment

Keeping in mind the descriptive scope i intend to work on the full time equivalent for this project over the 12 week period and extending it to 16 weeks . In the community bonding period I intend to discuss the scope of the project and finalise the checkpoints with the mentors. As soon as that is done i will start on the implementation of the project starting with generating ci/cd git workflows to streamline further development.

I have my year end / summer break across the coding period so i will be fully available to concentrate my energy on this project. ~30h/week

Post the program i intend to explore the monetisation measures for this project and implement tensor and notebook related functionalities in the program

Schedule

Assuming a 12 week schedule,

During the community bonding period, I plan to work with the mentors to discuss the final scope, set clear and confirmed goals , milestones for the project. I also plan to implement the git Ci/Cd workflows and decide on a temporary build process for the upcoming weeks. I plan to start with the project in the bonding period itself if all deliberation agendas have been met.

Week 1 will see the implementation of the base packages as aforementioned. To avoid maintainer noise and clutter i will be spreading my pull request for the packages throughout the week and grouping similar packages into 1 PR if need be. Ideally week 1 should see implementation of ~45+ packages.

Packages to be implemented from assert namespace contains is-absolute-http-uri is-absolute-path is-absolute-uri is-alphagram is-alphanumeric is-anagram is-ascii is-between is-binary-string is-blank-string is-boolean is-capitalized is-composite is-cube-number is-current-year is-digit-string is-email-address is-empty-string is-even is-falsy is-finite is-hex-string is-infinite is-integer is-leap-year is-localhost is-lowercase is-nan is-negative-integer is-negative-number is-nonnegative-integer is-nonnegative-number is-nonpositive-integer is-nonpositive-number is-number is-odd is-positive-integer``is-positive-number is-prime is-probability is-regexp-string is-relative-path is-relative-uri

Week 2 will including implementing packages further and finalising the PRs of the prev week.

Packages to be implemented from assert: is-semver is-square-number is-string is-triangular-number is-truthy is-unc-path is-uppercase is-uri is-whitespace deepequal is-camelcase is-complex is-constantcase is-even is-kebabcase

Note - The is-complex will likely need more utility to parse it from a string to a complex number which is mentioned further in the proposal

Packages to be implemented from string:

acronym code-point-at ends-with format from-code-point left-pad left-trim-n left-trim num-grapheme-clusters pad percent-encode remove-first remove-last remove-punctuation remove-words repeat replace reverse right-pad right-trim-n right-trim split-grapheme-clusters starts-with substring-after-last substring-after substring-before-last substring-before trim truncate-middle truncate

Packages to be implemented from random: base/randi base/randn base/randu sample (refactoring) shuffle (refactoring)

Week 3 will including implementing packages further and finalising the PRs of the prev week. In week 3 i will complete the implementation of the array and number namespace

Packages to be implemented from array: datespace incrspace logspace unitspace cartesian-square cartesian-product cartesian-power n-cartesian-product one-to zero-to take

I have noticed in my research that the cartesian related functions are quite needed. We will need to figure out how to render the view of products in google sheets

Packages to be implemented from number: float64/base/exponent float64/base/from-binary-string float64/base/from-words float64/base/get-high-word float64/base/get-low-word float64/base/normalize float64/base/set-high-word float64/base/set-low-word float64/base/signbit float64/base/to-binary-string float64/base/to-words uint16/base/from-binary-string uint16/base/to-binary-string uint32/base/from-binary-string uint32/base/rotl uint32/base/rotr uint32/base/to-binary-string uint8/base/from-binary-string uint8/base/to-binary-string

Week 4 will see the implementation of math/base/special packages which includes the following packages

Link

and the implementation of nlp packages: expand-acronyms expand-contractions ordinalize porter-stemmer tokenize

Week 5 would see the implementation of stats: anova1 binomial-test chi2gof chi2test fligner-test kruskal-test kstest levene-test lowess padjust pcorrtest ranks ttest ttest2 vartest wilcoxon ztest ztest2 base/* base/dists/*

Implementation of stats will require implementing stats/base/* which exposes statistical tests. This step will likely take time due to the volume of packages present. I will use scaffolding and. automation processes to set up the apis

By week 6 a lot of code has been written and is ready for midterm evaluation ! Beyond mid term evaluations i will focus on getting the old pr/s backlogs over the finish line and implementing thesimulate packages. I will also research on stats here

awgn awln awun bartlett-hann-pulse bartlett-pulse cosine-wave flat-top-pulse hann-pulse lanczos-pulse periodic-sinc pulse sine-wave square-wave triangle-wave

In this week I will implement the complex namespace. This will help to perform operations on complex numbers. The representation of complex numbers in google sheets is by default string . My approach here will be to use the parser we have in stdlib to convert the string inputs to complex numbers. The complex namespace will require some R&D to implement. However its implementation will be quite useful as it will have wide ranging implications

I plan to implement complex/base/assert and complex as well as use additional utilities such as wrapping functions and parser from complex/base to facilitate development

In this week I plan to implement the BLAS - basic linear algebra subprograms namespace in which blas/base and blas/ext/base functions would be particularly useful in computations . These have wide ranging applications and would be one of the most used packages according to my research.

In this week i will implement the 2d array semantics according to the rules and operations decided. This should take around -40 hours as it will include a good amount of R&D and a number of iterations to settle on the final code. Pull request will be generated for each RFC

Beyond completing the 2d array semantics and broadcasting apis, Week 10 will focus on implementing performant fused operations, focusing on optimising multiple function calls into single server calls. I will research and experiment with element-wise iteration APIs to ensure efficiency and begin exploring the creation of APIs for efficient iteration over spreadsheet data, considering chunking and batch processing.

In week 11, I will complete the implementation of performant fused operations, ensuring that multiple operations are efficiently combined into single server calls. I will also finalize the element-wise iteration APIs, ensuring they meet performance requirements and handle large volumes of data effectively. A final PR for phase 3 will be generated in week 12 The additional tasks involved are

By the end of this week, I aim to have production-ready performant fused operation APIs with tests. I will also continue with the documentation process, focusing on creating higher-level documentation for namespaces and packages.

I will shift focus to documentation and tutorials, starting with the creation of higher-level documentation for namespaces and packages. I will begin drafting tutorials on API usage, considering both written and video formats. Furthermore, I will set up a subpage on the main stdlib website to host the tutorials and related documentation. There will be a page for each namespace therefore ~ 13-14 pages. The website documentation will define the functionality and definitions in detail. I plan to write 2 tutorials which will cover the stats and math namespace

By the end of this week i aim to have

- Final documentation for GSheets integration.
- Initial set of text-based tutorials demonstrating API functionalities.

I will continue working on adding the side panel in sheets for users, focusing on the design and user interface aspects. The interface will also link to tutorials. Additionally, I will start streamlining build processes to set up a CI/CD pipeline for future updates.

During the final week, I will focus on completing any pending tasks, conducting final testing and debugging, and preparing for the final evaluation. I will ensure that all deliverables are well-documented and ready for deployment, making any last-minute updates or improvements as necessary.

Notes:

Related issues

13

Checklist

kgryte commented 3 months ago

@adityacodes30 Thank you for sharing a draft of your proposal. A few comments:

  1. In general, your outline looks reasonable. However, you should explore the repository a bit more to understand some of the scaffolding aspects (e.g., https://github.com/stdlib-js/gsheets/tree/main/scripts), which would allow for automating the exposure of stdlib APIs within gsheets.
  2. You mention stats/base/dists/*. You should understand a bit more the scale of that undertaking. Without scaffolding and automation, that effort will take significant time given the number of packages/APIs.
  3. Operation fusion is likely to require the development of a CAS. We could potentially use a third party library for this (e.g., Math.js), but we'd need to do some R&D to determine how feasible. Notably, we'd need to be able to hook in our APIs such that, e.g., the string sin(cos(x)) is parsed and evaluation calls our sin and cos APIs. It's possible that third party libraries allow this (e.g., by parsing a mathematical expression string and returning an intermediate representation), but I don't know.
  4. You largely based your list of APIs from the TODO document; however, that document is somewhat outdated and was originally based on APIs that I personally thought could be easy to expose as gsheets APIs. In short, it is primarily low hanging fruit. There may be fruit higher up the tree that could be implemented, as well. In which case, the ndarray APIs could reasonably be deprioritized, as these are not likely to be commonly used by most users.
adityacodes30 commented 3 months ago

Got it , scaffolding does make sense and i will integrate it as in the steps further. That should significantly reduce the time to implement the packages. I think scaffolding and automation will save me enough time to work on CAS and perhaps ndarray apis as well

About Parsing - I found this . Math.js does provide a parser, to get it to call stdlib APIS, it also does contain a scope to define functions . It should prove to be useful, still there is a need for some R&D. If need be we can also build it in-house

steff456 commented 3 months ago

Hi @adityacodes30, thanks for sending this draft proposal!

I see you have a very ambitious project with a lot of deliverables, which we may need to refine to scope better the whole project and set you up for success. Please use these questions as guidance,

  1. How are you proposing to define the APIs that will be exposed?
  2. What are the plans for the documentation? (How many new pages or sections are you planning to work on?)
  3. How many tutorials are you planning to write? (Also, do you have in mind the topics for them?)

I would recommend you to think first about the tutorial(s) and documentation to understand and plan what APIs would make more sense to work on early on. Also, please take into account that I estimate that a tutorial needs around 3 weeks to develop.

adityacodes30 commented 3 months ago

Thank you @steff456. That does put things more into perspective. I will incorporate the suggestions

Yes, documentation is key. With the additional information now, I think making the tutorials would likely continue in the extended coding period and beyond the program