Closed Snehil-Shah closed 6 months ago
@kgryte Would really appreciate some feedback as only a few days are remaining till the deadline.
I also had some doubts regarding less/more documentation pager and custom key bindings as mentioned in this draft proposal, would like some clarity..
@Snehil-Shah Thanks for sharing a draft of your proposal and for your thorough discussion of the various tasks. A few comments:
less
/more
. Yes, in short, we'd want to implement something like Linux's less
command. When I looked into this a while back, I opted for Linux's more
command as the API is much simpler. The main things we'd want to initially support are (1) scrolling/pagination up/down and (2) search. If we got syntax highlighting over the finish line, it would also be nice to syntax highlight repl.txt
examples.Hi @Snehil-Shah thanks for opening this draft proposal!
I see that your proposal is really ambitious and I'm not fully clear if you are expecting to add all of these features to the REPL. I think the scope is too big, specially because just creating a syntax highlighter can take several weeks without testing, and the same goes for the autocompletion idea. I will recommend you to focus on one idea and detail more how and why you think that is interesting as a project for you, I think this aligns with the last comment from @kgryte.
Building on Stephannie's comment, I think the scope is possible, but it certainly requires many things going right. Auto-closing brackets (see https://github.com/stdlib-js/stdlib/pull/1680), for example, was something that ended up taking about a month to get over the finish line due to scoping, refactoring, addressing corner cases, and review. In order for us to execute on all the various tasks, we'd need to have strong alignment and a strong sense as to how the implementations will work.
And agreed on syntax highlighting. There may be a number of small changes that we'll need to make to the REPL codebase to make this happen, and some of these changes will likely be prerequisite. So having an idea of what changes may need to be made may help with scoping and timeline.
Hence, it may be good to have a list of smaller, very concrete tasks which lend themselves to small, regular PRs, and then some larger, more open-ended tasks which can be done in parallel.
Full name
Snehil Shah
University status
Yes
University name
Indian Institute of Information Technology, Nagpur
University program
Computer Science and Engineering
Expected graduation
2026
Short biography
I am a 2nd year engineering student at the Indian Institute of Information Technology, Nagpur, India pursuing my Bachelor's degree in Computer Science and Engineering. My first introduction to computer science was in high school through Python about 4 years ago. As I entered college, I was introduced to C/C++ and a lot of maths, and practiced data structures and algorithms in my first semester. Soon I delved into development and well, was introduced to JavaScript and Typescript. Started with full stack development using React, Express, and MongoDB and later went on to explore various backend technologies and DevOps as well, building various projects along the way. I even started contributing to open-source in projects and domains I love.
Timezone
Indian Standard Time (GMT+5:30)
Contact details
Email: snehilshah.989@gmail.com Github: Snehil-Shah LinkedIn: snehil-shah-dev
Platform
Linux
Editor
My preferred code editor is VSCode because of how configurable and feature-rich it is, especially for development in JavaScript. Extensions bring about more helpful features like ESlint warnings in the editor itself making it easier to keep up with consistent code quality.
Programming experience
I learned Python in high school and C/C++ in college and have experience solving algorithmic problems with them. I then ventured into learning JavaScript and web development and later learned Golang and various frameworks & backend technologies (like the MERN, Cobra etc) while building many projects, from web apps to CLI tools along the way! One of the big ones I made is Seismic Alerts Streamer, which aims to connect Seismology providers directly to the public in a scalable architecture. I built it around a pub-sub model utilizing Apache Kafka and it features the ability to view live logs of seismic activity from around the world and also view them in an interactive map view built using React.
JavaScript experience
I learned JavaScript as part of a course at my college. I later learned various JavaScript frameworks like Express and React through a full-stack project which is a simple GitHub project planner that I made utilizing the GitHub API.
My open-source journey till now further strengthened my grasp on JavaScript.
Things I like about JavaScript:
Node.js experience
Most of my experience in Node.js is through Express.js which I have used in various backend-dependent projects (like the one above). Working on the REPL for the past month did teach me a lot about node's readline module and making CLI interfaces using it.
C/Fortran experience
C was the first language I was introduced to in college and I have a good grasp of it as I have experience in solving various competitive problems around data structures and algorithms. I also built a small movie ticketing system mini-project as part of my college course which is a simple terminal-based ticket booking system backed by a MySQL backend. I don't have any experience with Fortran though.
Interest in stdlib
Whenever we think of data analysis and engineering, Python comes to mind even though JavaScript is used to build literally everything from websites to mobile/desktop apps to CLI tools. Stdlib's mission in bridging this gap is a great one and I would love to be a part of it. I have used various string-related methods from stdlib in many of my contributions and I like how well-documented and easy to use they are. I have just cracked the surface though as the library is HUGE.
I also like how welcoming stdlib is to new contributors with so many resources in place to get on board and responsive & helpful maintainers. I have never written code for a professional library before so getting through my first PR here taught me so many good practices that need to be taken care of like writing JSDocs and consistent code styling to name a few. Contributing to this project definitely made me a better programmer and I wish to learn more!
Version control
Yes
Contributions to stdlib
REPL:
settings
REPL command to control REPL-related features from inside the REPL. @kgryte did most of the heavy lifting though also extending it to a public API.Add new packages:
C implementations:
Refactor existing BLAS packages to follow current conventions:
Update benchmarks to measure affirmative/negative cases seperately:
Goals
The REPL is a staple for individuals who are learning, prototyping, debugging, and exploring the language, as well as its APIs and libraries, all without the need to write and execute entire scripts. For a library emphasizing numerical and scientific computing, a well-featured REPL becomes an essential tool allowing users to easily visualize and work with data in an interactive environment. The stdlib REPL aims to be a better alternative to the node.js REPL with a specialized focus on scientific computing and data analysis using tailored features and tutorials to help individuals get started.
The goal of this project is to implement various enhancements to the stdlib REPL. The improvements proposed are listed below:
Fuzzy auto-completion extension
Improve tab completion suggestions by providing completions if it's it not an exact match for more relevant results.
Outcomes
Implement a fuzzy matching algorithm instead of strict prefix matching while providing tab completions.
Completions should be displayed based on relevancy and the matching letters should be highlighted as discussed here.
In [7]: ys<TAB>
yes
Approach
Implementing a fuzzy matching algorithm:
Below is prompt-toolkit’s core logic (simplified) which builds a regex such that the letters of the input should appear in the same order in the completion string (allowing other characters in between) to filter out and further score them. The only limitation is that it doesn’t account for possible spelling mistakes as it expects every letter of the input string to be present in the completion.
This is an algorithm I wrote myself, that is forgiving of spelling mistakes.
We can write an algorithm that takes from both of these. We can use the scoring mechanism from my algorithm to score against the characters that do not exist in the completion string while making sure the characters that do exist, follow the regex pattern. Or an algorithm where the number of missing characters from the input string in the completion string and the number of characters between the input string's characters in the completion string (distance) is decided to score the completion string.
Displaying the completions:
We currently depend on the inbuilt readline module in node.js for the tab completions, so we don’t have control over how the suggestions are displayed. Although we can coat our completions with ANSI codes beforehand, that interferes and complicates the auto-completion feature further as discussed here.
One solution that came up is writing our own completer inspired by the built-in readline module that can support highlighted suggestions.
We can have an object like below to denote a completion.
The completion property of the object can easily work with all existing autocompletion APIs like auto-inserting longest prefixes etc of the readline’s completer while the display property can be used to control what is displayed for that completion in the output.
Prior art
Prompt-toolkit's completer and logic
Codemirror Implementation - Worth exploring
Using Levenshtein distance - Although I personally wouldn't recommend this algorithm as discussed in OP
Related features that can be added (if time persists)
reverse-searching previous commands in the REPL: This can use the same matching algorithm to search for previous commands utilizing the
_history
buffer.Related Issues
Support for displaying suggested corrections
This can be a really helpful addition to the REPL, we can provide suggested corrections for cases where an unidentified identifier is entered like an undefined variable, object's property, module or path.
Outcomes
Implement a fuzzy suggestion algorithm that suggests similar-looking identifiers when an unknown identifier is entered, instead of just throwing the error.
Extend this to the
help()
method similar to julia.Approach
Classify the type of identifier
We can use a regex validator to determine what type of identifier is entered and what type of error is being raised. For example, if it's an unknown variable or an unknown object property.
This implementation is similar to how we currently handle auto-completions. The completer uses regex to classify entered statements into incomplete filesystem, workspace, expressions, require, and tutorial expressions.
Suggest possible corrections
Once we have classified the type of identifier to suggest, similar to the completer logic that uses a fuzzy completion algorithm (yet to be implemented) to match completions from an AST, filesystem, reserved keywords, and other places depending on the classification, we too can use these to generate set of possible completions.
Although I would arguably use a different algorithm for this as it's a different use case. An algorithm that denotes on how different two strings look might be a better approach to use in this case as corrections mostly occur due to spelling mistakes unlike in code completion where we are generally looking for how much of a prefix the input is to the completion. The Levenshtein algorithm is a popular algorithm that does exactly that.
Extend this to the
help()
method using the same logicOverall algorithm:
Identify and classify the kind of suggestions needed (filesystem, expressions, require, etc)
Use the fuzzy Levenshtein algorithm to find similar identifiers from the AST, filesystem, etc.
Prior art
stdlib completer
Julia's REPL help mode
Related Issues
https://github.com/stdlib-js/stdlib/issues/2058
Multi-line editing
Currently, the REPL goes into multi-line mode if it detects an incomplete expression. Once we hit ENTER, there is no way to edit the previous line as hitting the up arrow triggers readline's default behaviour of bringing up previous commands. Additionally, we don't have a manual way to enter multi-line mode.
Outcomes
Discuss and implement ways to manually enter multi-line mode.
Implement editing previous lines using the up arrow
Approach
Entering manual multi-line mode:
There can be multiple ways we can enter multi-line mode as discussed here:
Implementing this is straightforward by listening for keypress events as I did here.
.editor
command: If modifier keys still seem problematic, we can also take a nodejs-type approach which has a dedicated .editor command that spins up a multiline editing mode. This is pretty straightforward to implement as it involves just writing an internal command that will do everything.Implementing multi-line editing:
By default, the readline interface provides the previous commands when the up key is triggered. The legal way would be to handle these keypress events, using the keypress event listener. One problem is the readline interface would still trigger the default operation (clearing the current line and printing the previous command). We can of course manually undo this first and then move the cursor position upwards.
But to simplify this, we can use the private
_ttyWrite
method instead, which will allow us to process keypress events before they are emitted. Although we shouldn't be using a private method, we have already used it here and can be reused for this purpose.In an oversimplified way, this is what we are getting at:
Now this just changes the cursor position visually, we still need to update and maintain the line buffers and the command history accordingly.
We can then keep track of the line number to update the entire command so far, roughly something like this:
This a rough roadmap to achieve multiline editing in the REPL.
Prior art
Node.js REPL's
.editor
command: Although it doesn't support going to the previous lineIPython's CTRL+O key combination
nano
Related features that can be added (if time persists):
Inserting entire command when pressing up/down arrow in REPL: This too requires listening for up/down strokes in the
beforeKeypress()
listener, overriding the default behaviour and utilizing our internal command_history
store.A
nano
type editing mode as discussed here: I presume this can take some time and can be better left as a future concern.Related Issues
https://github.com/stdlib-js/stdlib/issues/2060
https://github.com/stdlib-js/stdlib/issues/2070
Bracketed-paste
When pasting multiple lines of code into the REPL, as soon as it encounters the newline characters, it executes that statement, this shouldn't be ideal behavior. Bracketed-paste refers to being able to distinguish when the input is pasted code, and handling it differently, ideally allowing the user to edit it before execution.
Outcomes
Implement bracketed paste allowing users to paste multiple lines of text without execution, if the terminal supports it.
Approach
Utilizing the terminal's bracketed paste mode:
Bracketed paste mode is generally disabled by default, but the terminals that do support it can be turned on by writing an escape sequence, ie.
_rli.ostream.write('\x1b[?2004h');
.The terminal then wraps the pasted text with a specific escape sequence, that we can use to identify if the content is pasted. If the content is pasted, we can prevent executing the code if newlines are encountered.
A hacky implementation: Not recommended.
Another hacky way we can implement this is by overriding the
line
event of the readline interface to only be triggered when a keypress event with a key value of ENTER is received. When pasting newline characters, I assume the ENTER key value would not be received and we end up not executing the pasted content. If this works, it will work on most terminals even if they don't support bracketed paste mode.References
https://cirw.in/blog/bracketed-paste
https://github.com/nodejs/node/pull/47150/files
Related features that can be added (if time persists):
Formatting pasted code for readability.
Related Issues
https://github.com/stdlib-js/stdlib/issues/2068
Pretty-printing of tabular data
There should be a way to visualize data like an array of objects in a tabular form. As a REPL, aiming to emphasize on data analytics, this becomes a crucial feature.
Outcomes
Implement package
@stdlib/plot/ascii/table
.Implement
table( data[, n] )
REPL command that utilizes the above packageApproach
Implementing API for plotting an ASCII table We should be able to parse the given data into rows and columns and can follow any of the prior arts to design the table. We should also parse
ndarray
as matrices.Implementing a table command This is pretty straightforward once the above API is implemented.
Prior art
https://github.com/sorensen/ascii-table
https://github.com/tecfu/tty-table
Related Issues
https://github.com/stdlib-js/stdlib/issues/2067
Syntax-highlighting and bracket matching
Outcomes
Syntax highlighting
Bracket pair matching
Approach
Syntax highlighting
There are some packages like emphasize that can help us easily implement syntax highlighting. Node's REPL rewrite uses this too for syntax highlighting.
However, if we are trying to avoid dependency overhead we can implement it ourselves, though I am not sure how tedious it can get.
One of the ways we can try implementing this is using the acorn parser to loosely parse the current line (after every keypress) to create our AST of tokens of different types. Then we can traverse the AST and wrap each node with ANSI color codes depending on the type of node. From how much I've tried, I am not sure how good
acorn-loose
is at parsing incomplete expressions.We can have 2 themes to begin with: Light and Dark.
When it comes to executing commands or exporting to a file, we can strip the ANSI sequences manually. But it can get tedious, as we will mostly be working with raw text for all operations, including storing history, variables, commands, etc. We just need coloring for display. So, instead, we can do the coloring in the final layer before printing and use raw text everywhere else.
Bracket pair matching
Bracket pair matching involves highlighting the current code block by highlighting the brackets enclosing the logical block of code the cursor is inside
Although Bracket pair matching can also be achieved using the acorn parser to some extent. it wouldn’t be able to parse all brackets. For example, if I just type () in the terminal, it won't highlight anything.
We can write a simple algorithm inspired from prompt-toolkit (used by IPython).
IPython only highlights the brackets when the cursor is adjacent to the brackets, but we can maybe extend that to keep the brackets enclosing the cursor, highlighted at all times.
A rough sketch of the algorithm:
It works by traversing left and right to the cursor to find enclosing brackets and utilizes a stack to ignore all internal bracket pairs, the found indices can then be highlighted.
Prior art
Prompt-toolkit's bracket matching algorithm
emphasize - Syntax highlighting
Node's REPL rewrite uses emphasize for syntax highlighting
Related Issues
https://github.com/stdlib-js/stdlib/issues/2072
Less/more documentation pager
I am still a bit confused about what type of behavior we have in mind.
Based off this comment I assume we are looking for a
less
type behavior. And based on this comment, I think it can be a bit complex.I have a simpler design in mind. based on the height of the terminal window, we will just print the help text that is a bit shorter than that ending with,
press CTRL+M to expand, CTRL+X to exit
(for example). The help(), would be an infinite process that is interrupted by CTRL+X in this case. Would appreciate some pointers here.Custom key-bindings support
I am a bit confused about this too, are we talking about allowing the user to configure certain actions? Or just general keybindings support for a lot of common tasks, like CTRL+V for pasting content, etc. Need some direction on this
Tests
The REPL currently lacks test coverage. I would write tests to keep the REPL bug free in the long run.
Documentation and tutorials
After implementing all these features, there certainly comes a need for tutorials to allow easy learning of the REPL. We can also create tutorials for common use cases like handling data for data analysis etc.
We already have our REPL presentation framework in place so implementing this would be easy.
Small additions (optional)
If time permits, we can add these small improvements as well:
Why this project?
As a JavaScript developer, the NodeJS REPL is lacking in many ways, and there aren't many alternatives out there. This project can positively impact the NodeJS ecosystem, by providing a powerful yet easy-to-use REPL to the community. This excites me about this project and I would love to be a part of this journey!
Qualifications
I have studied JavaScript along with core computer science subjects like object oriented programming, algorithms, operating systems, computer architecture, Linux & git in college. I also fairly understand the REPL codebase to be able to execute on the proposal.
Prior art
Prior arts for specific features are mentioned in the abstract.
Commitment
My summer break from college starts May 15. So, during the coding period (starting May 27), I would be available full-time for around 2 months with no other commitments. I would be able to commit 40+ hrs/week for 2 months. Then with 1 month along with college, I will be able to devote around 20 hrs/week for the remaining month.
I would be able to commit around 400 hours to the program.
Schedule
Assuming a 12 week schedule,
Community Bonding Period:
Week 1:
Week 2 & 3:
Week 4 & 5:
Week 6: (midterm):
Week 7 & 8:
Week 9:
Week 10:
Week 11:
Week 12:
Final Week: Project submission!
Related issues
1
Feature-specific issues are mentioned in the abstract.
Checklist
[RFC]:
and succinctly describes your proposal.