stdlib-js / google-summer-of-code

Google Summer of Code resources.
https://github.com/stdlib-js/stdlib
26 stars 7 forks source link

[RFC]: Building a better REPL #56

Closed tudor-pagu closed 5 months ago

tudor-pagu commented 6 months ago

Full name

Tudor Stefan Pagu

University status

Yes

University name

Delft University of Technology

University program

Computer Science and Engineering

Expected graduation

2026

Short biography

My name is Tudor Stefan Pagu, and I am currently in the first year of my Computer Science and Engineering bachelor at Delft University of Technology, where I have completed several courses relevant to this project, such as Linear Algebra, Calculus, Algorithms and Data Structures, Computer Organisation, Web and Database technologies, and Object Oriented Programming. My biggest passion has always been problem solving. Using C++, I've participated in several International Computer Science Olympiads (won silver medal at IZHO 2022, bronze medal at IATI 2022), which have sharpened my knowledge of algorithms, data structures, and mathematics. I enjoy building building applications, and have finished several personal projects using JavaScript. I'm also interested in machine learning, and have spent time using libraries such as Numpy and Tensorflow.

Timezone

GMT+1

Contact details

pagu.tudor@gmail.com

Platform

Linux

Editor

My preferred code editor for JavaScript is vscode, for its speed and convenient features, such as the node.js debug terminal. For other projects in Java or C++ I like to use JetBrains products.

Programming experience

I learning programming by competing in the Romanian Olympiad of Informatics, where I was using C++ to solve algorithmic problems. Later, I learned HTML,CSS, JavaScript, TypeScript, and React, with which I've built many projects. The ones I'm proudest of are this message board/mini social network: (live website | repository), and this blog with articles about civics education: ([live website] (https://hai-sa-fim-cetateni-model.netlify.app/) | repository).

JavaScript experience

I have experience using JavaScript and TypeScript from building personal projects, such as this message board built with Next.js, React, TypeScript, and FireBase (live website | repository) or this blog about civics education built with React, TypeScript and Netlify CMS. ([live website] (https://hai-sa-fim-cetateni-model.netlify.app/) | repository).

I am familiar with most JavaScript features, such as asynchronous code, prototypes, classes, promises, functional programming features, closures, etc.

Node.js experience

I've used Node.js to build back-ends, particularly in my Web and Database Technology class. I am comfortable using Node.js to interact with the file system or call APIs.

C/Fortran experience

I have experience with C++ from competing in Computer Science Olympiads, so I understand the basic syntax as well as pointers and other more advanced concepts. I don't have as much experience with C specifically, but I could learn it quickly if it's needed for my project.

Interest in stdlib

What I like the most about stdlib is how high the standard of quality is for the code that is contributed and how many resources are in place to help you reach that level of quality, such as the linter or code style guide. Furthermore, the community is very friendly, responsive, and knowledgeable, and I truly feel that I am learning a lot while contributing. In terms of features, I like how extensive the library is. While working with stdlib, I could find almost every function I needed after only a quick search.

Version control

Yes

Contributions to stdlib

https://github.com/stdlib-js/stdlib/pull/1832

Goals

The REPL (Read-Eval-Print-Loop) is an interactive terminal environment that typically serves as an entry point for individuals seeking to learn the behavior of an API, and is a useful tool for debugging. Stdlib's REPL is an alternative to the Node.js REPL, and has the potential to provide an interactive environment for rapid experimentation and learning, analogous to IPython. However, stdlib's REPL is currently underdeveloped and lacks essential functionality. My proposal is to implement a suite of enhancements to the stdlib REPL, making it into a compelling and user-friendly computing platform, further establishing node.js as a platform suitable for data science, machine learning, and other fields involving numerical computation.

For my project, I plan on implementing the following features:

On the fly object inspection In the Node.js REPL, possible tab completions are shown as a grey preview as the user types. I implemented this feature for the stdlib REPL when I first started looking at stdlib as a potential group for Google Summer of Code 2024 (see this PR). I would also like to implement object inspection, which would show the properties of the objects the user is creating, as the user is typing. (see this issue). This can be implemented using util.inspect. In addition, I will also implement eager execution which will show the possible result of the line the user is typing, using the inspector module, which stops execution when it encounters side effects (such as HTTP calls, mutations, file operations, etc.). It will also be important to limit costly operations (such as sorting large arrays), since this might lead to slowdowns. Finally, I will ensure consistent behavior when completing file paths, function names, and variables. Currently, only initialized variables are completed, and variables defined with let/const are not completed at all. I plan on fixing these issues.

Fuzzy auto-completion extension Fuzzy auto-completion would extend normal tab completion to include strings that are similar, but not identical, to the string being matched. For example:

var pizza = 1
var jazz = 2
zz<TAB>
# displays `pizza' and `jazz'

(example adapted from https://github.com/mgalgs/fuzzy_bash_completion) These fuzzy matches would not be considered when displaying the gray preview from the previous section, but will be shown when the user presses and will help in situations where the user is not sure of the exact name of a function/variable.

Pretty printing of tabular output Many applications such as machine learning or data science revolve around tables of data. The REPL should automatically detect such types of data and display them in a more readable format. For example, arrays of arrays should be displayed as a table. The format of the table can be similar to that displayed by console.table(). Another idea would be having support for automatically adding aggregate functions. There could be a builtin function which, in addition to the data, can take some aggregate functions such as SUM or AVERAGE, and then compute the relevant columns.

Bracketed paste The user should be able to paste multiple lines of code into the terminal without them being treated as separate statements. This should be relatively straight forward to implement using escape sequences that turn on/off bracketed paste mode.

Less/more documentation pages When a function that returns a large amount of text is run, the REPL could open a "less" pager, which allows the user to scroll back and forth through the lines, without having all of them be printed to the terminal. This poses several challenges, mainly:

  1. Interrupting the normal control flow of the REPL
  2. Implementing special commands related to the pager, such as jumping to the end or beginning of files, and searching for specific strings.

However, I am confident I can implement these features through carefully using escape sequences and modifying the readline's default behavior.

Terminal syntax highlighting and bracket matching To make using the REPL more like using an IDE, I will add syntax highlighting using escape sequences. I will also match brackets and color different pairs of brackets with different colors, to make the code more readable. A challenge will be dealing with different terminal themes which might make certain hightlighting themes unreadable. To solve this, I would use a hightlighting library that contains multiple themes (such as this), and give the user the option to pick their preferred theme. For shells using xterm, the background color can be determined at runtime (see this article), so it would be possible to pick a theme automatically (for light mode vs dark mode, for example), but there unfortunately seems to be no general solution for detecting the terminal theme, so the user will have to pick in those cases.

Custom key binding support and other user settings I will add the possibility of configuring the REPL in an easy and accessible way, so that most features can be turned on/off by the user. Using with these settings, the user will be to define additional key bindings.

Improved documentation In order for the stdlib REPL to be a truly valuable learning resource, there has to be more documentation about its features, as well as tutorials showcasing how to use the REPL for applications as machine learning or data science. I will use the already existing REPL presentation frame work to create additional tutorials covering these topics.

Better support for loading JavaScript files One use case for a REPL is debugging. The user should be able to import a JavaScript file into the REPL, and then call functions from within that file interactively. Furthermore, it would be useful if the REPL could listen for changes in that file and automatically reload it (similar to IPyhton's autoreload). Some challenges in implementing this feature could be:

Testing and bug fixes Currently, the REPL has close to no tests (besides a few trivial tests, the only other tests are the ones I implemented in this PR). I will implement a comprehensive suite of tests and benchmarks to ensure the REPL is functional and fast. I will also fix any existing bugs I come across while implementing the features mentioned above.

Why this project?

The main reason I am interested in this project is because I often use a REPL in my own workflow, and I would love for there to be a truly developed and feature-rich REPL for numerical computation. I also think that the best way to learn something is by experimenting with it, and the interactive medium of a REPL is an amazing tool for that. I'm also excited to work on something so user-facing. I am motivated by the thought that anybody using stdlib will have access to a tool built I built, and all my work is directly contributing to the user's experience.

Qualifications

I have taken the class Computer Organization, where I learned about IO, as well as ANSI escape codes. I believe my experience with JavaScript and Node.JS will allow me to execute on my proposal.

Prior art

There are a plethora of REPL environments from which inspiration can be drawn. Some notable ones are:

Commitment

Until my classes end on July 1st, I aim to devote 15-20 hours/week to my project. After that, I have no other commitments for the summer and plan to work a standard 40 hour week up until the end of the program.

Schedule

Assuming a 12 week schedule,

Related issues

https://github.com/stdlib-js/google-summer-of-code/issues/1 https://github.com/stdlib-js/stdlib/issues/1775 https://github.com/stdlib-js/stdlib/issues/1794

Checklist

kgryte commented 6 months ago

@tudor-pagu Thanks for filing this draft proposal. A few comments:

  1. For syntax highlighting, how would we deal with terminal theming? E.g., if a user has their terminal background as black, that should result in different highlight colors than for a user with a white terminal background.
  2. Would it make sense to design an API to allow a user to specify their own syntax highlighting theme?
  3. My sense is that a less/more pager could be a bit involved. I tried implementing in the past, but got stuck. Maybe you'll have better luck. :)
  4. One of the items missing from your goals is "bracketed-paste". Is there a reason for its exclusion? To be clear, this means support for pasting multiple lines into the REPL without them being treated as separate statements and getting executed individually (which is what happens now). As this is a common use case, I would suggest adding it to your list of goals and schedule.
  5. on-the-fly object inspection seems potentially problematic. I commented on the associated issue: https://github.com/stdlib-js/stdlib/issues/1794#issuecomment-2019937792
tudor-pagu commented 6 months ago

@kgryte Thank you so much for your feedback, I've made changes to incorporate it all the points you've made. About making an API for customizing the syntax highlighting, I think it's more important to have a good set of predefined themes, since there's no standard for a syntax highlighting theme so it's not like users could import it from their IDE. I could definitely add it to my proposal if it's an important feature, but it just seems to me that most users won't really want to customize the highlighting to that extent.

kgryte commented 6 months ago

@tudor-pagu Re: syntax highlighting. Agreed. Having some good (accessible) defaults would likely get us 80% of the way there. My main concern is ensuring that we have themes which can work on either a dark or a light background, as what works for one may not work for the other. You've already touched on this in your proposal, so should be fine.

Re: inspector. Note that this is a relatively newer feature in Node.js. We support Node versions all the way back to v0.10, so we'd need to consider a backward-compatibility story.

In general, I'd be interested in knowing if you have other ideas for improving the REPL that we haven't already thought of.