Snehil-Shah commented 7 months ago

Full name

Snehil Shah

University status

Yes

University name

Indian Institute of Information Technology, Nagpur

University program

Computer Science and Engineering

Expected graduation

2026

Short biography

I am a 2nd year engineering student at the Indian Institute of Information Technology, Nagpur, India pursuing my Bachelor's degree in Computer Science and Engineering. My first introduction to computer science was in high school through Python about 4 years ago. As I entered college, I was introduced to C/C++ and a lot of maths, and practiced data structures and algorithms in my first semester. Soon I delved into development and well, was introduced to JavaScript and Typescript. Started with full stack development using React, Express, and MongoDB and later went on to explore various backend technologies and DevOps as well, building various projects along the way. I even started contributing to open-source in projects and domains I love.

Timezone

Indian Standard Time (GMT+5:30)

Contact details

Email: snehilshah.989@gmail.com Github: Snehil-Shah LinkedIn: snehil-shah-dev

Platform

Linux

Editor

My preferred code editor is VSCode because of how configurable and feature-rich it is, especially for development in JavaScript. Extensions bring about more helpful features like ESlint warnings in the editor itself making it easier to keep up with consistent code quality.

Programming experience

I learned Python in high school and C/C++ in college and have experience solving algorithmic problems with them. I then ventured into learning JavaScript and web development and later learned Golang and various frameworks & backend technologies (like the MERN, Cobra etc) while building many projects, from web apps to CLI tools along the way! One of the big ones I made is Seismic Alerts Streamer, which aims to connect Seismology providers directly to the public in a scalable architecture. I built it around a pub-sub model utilizing Apache Kafka and it features the ability to view live logs of seismic activity from around the world and also view them in an interactive map view built using React.

JavaScript experience

I learned JavaScript as part of a course at my college. I later learned various JavaScript frameworks like Express and React through a full-stack project which is a simple GitHub project planner that I made utilizing the GitHub API.

My open-source journey till now further strengthened my grasp on JavaScript.

Things I like about JavaScript:

Asynchronous Programming with promises and async/await.
Callbacks and cool syntax like arrow functions.

Node.js experience

Most of my experience in Node.js is through Express.js which I have used in various backend-dependent projects (like the one above). Working on the REPL for the past month did teach me a lot about node's readline module and making CLI interfaces using it.

C/Fortran experience

C was the first language I was introduced to in college and I have a good grasp of it as I have experience in solving various competitive problems around data structures and algorithms. I also built a small movie ticketing system mini-project as part of my college course which is a simple terminal-based ticket booking system backed by a MySQL backend. I don't have any experience with Fortran though.

Interest in stdlib

Whenever we think of data analysis and engineering, Python comes to mind even though JavaScript is used to build literally everything from websites to mobile/desktop apps to CLI tools. Stdlib's mission in bridging this gap is a great one and I would love to be a part of it. I have used various string-related methods from stdlib in many of my contributions and I like how well-documented and easy to use they are. I have just cracked the surface though as the library is HUGE.

I also like how welcoming stdlib is to new contributors with so many resources in place to get on board and responsive & helpful maintainers. I have never written code for a professional library before so getting through my first PR here taught me so many good practices that need to be taken care of like writing JSDocs and consistent code styling to name a few. Contributing to this project definitely made me a better programmer and I wish to learn more!

Version control

Yes

Contributions to stdlib

REPL:
- https://github.com/stdlib-js/stdlib/pull/1680 (open)
- Implemented auto-matching brackets and quotations in the REPL. Although the implementation is not ideal for now, I am learning and working towards sharpening it with guidance from @kgryte.
- Implemented the first steps into multiline-editing using a modifier key, although was not working on all platforms and so is postponed to be implemented with this RFC and is included in this proposal.
- I also implemented a settings REPL command to control REPL-related features from inside the REPL. @kgryte did most of the heavy lifting though also extending it to a public API.
- https://github.com/stdlib-js/stdlib/pull/1855 (draft)
- Wrote a simple fuzzy matching algorithm from scratch that easily matches simple spelling mistakes (will improve it by borrowing from various prior works). The proposed implementation includes more features and needs more work as documented in this proposal.
- https://github.com/stdlib-js/stdlib/pull/1818 (closed)
- The plan was to add a prompt when in multi-line mode similar to the node.js REPL and IPython but it was discarded as it would lead to wastage of terminal columns and other valid reasons as discussed.
Add new packages:
- https://github.com/stdlib-js/stdlib/pull/1362 (merged)
C implementations:
- https://github.com/stdlib-js/stdlib/pull/1833 (merged)
- https://github.com/stdlib-js/stdlib/pull/1731 (merged)
- https://github.com/stdlib-js/stdlib/pull/1719 (merged)
- https://github.com/stdlib-js/stdlib/pull/1780 (merged)
Refactor existing BLAS packages to follow current conventions:
- https://github.com/stdlib-js/stdlib/pull/1700 (merged)
- https://github.com/stdlib-js/stdlib/pull/1741 (merged)
- https://github.com/stdlib-js/stdlib/pull/1455 (merged)
Update benchmarks to measure affirmative/negative cases seperately:
- https://github.com/stdlib-js/stdlib/pull/1458 (merged)

Goals

The REPL is a staple for individuals who are learning, prototyping, debugging, and exploring the language, as well as its APIs and libraries, all without the need to write and execute entire scripts. For a library emphasizing numerical and scientific computing, a well-featured REPL becomes an essential tool allowing users to easily visualize and work with data in an interactive environment. The stdlib REPL aims to be a better alternative to the node.js REPL with a specialized focus on scientific computing and data analysis using tailored features and tutorials to help individuals get started.

The goal of this project is to implement various enhancements to the stdlib REPL. The improvements proposed are listed below:

Fuzzy auto-completion extension

Improve tab completion suggestions by providing completions if it's it not an exact match for more relevant results.
- Outcomes
- Implement a fuzzy matching algorithm instead of strict prefix matching while providing tab completions.
- Completions should be displayed based on relevancy and the matching letters should be highlighted as discussed here.
  
  In [7]: ys<TAB>
  yes
- Approach
- Implementing a fuzzy matching algorithm:
  
  Below is prompt-toolkit’s core logic (simplified) which builds a regex such that the letters of the input should appear in the same order in the completion string (allowing other characters in between) to filter out and further score them. The only limitation is that it doesn’t account for possible spelling mistakes as it expects every letter of the input string to be present in the completion.
  
  This is an algorithm I wrote myself, that is forgiving of spelling mistakes.
  
  We can write an algorithm that takes from both of these. We can use the scoring mechanism from my algorithm to score against the characters that do not exist in the completion string while making sure the characters that do exist, follow the regex pattern. Or an algorithm where the number of missing characters from the input string in the completion string and the number of characters between the input string's characters in the completion string (distance) is decided to score the completion string.
- Displaying the completions:
  
  We currently depend on the inbuilt readline module in node.js for the tab completions, so we don’t have control over how the suggestions are displayed. Although we can coat our completions with ANSI codes beforehand, that interferes and complicates the auto-completion feature further as discussed here.
  
  One solution that came up is writing our own completer inspired by the built-in readline module that can support highlighted suggestions.
  
  We can have an object like below to denote a completion.
```
{
'completion': 'yes',
'display': '\x1b[1my\x1b[0me\x1b[1ms\x1b[0m'
}
```
  The completion property of the object can easily work with all existing autocompletion APIs like auto-inserting longest prefixes etc of the readline’s completer while the display property can be used to control what is displayed for that completion in the output.
- Prior art
- Prompt-toolkit's completer and logic
- Codemirror Implementation - Worth exploring
- Using Levenshtein distance - Although I personally wouldn't recommend this algorithm as discussed in OP
- Related features that can be added (if time persists)
- reverse-searching previous commands in the REPL: This can use the same matching algorithm to search for previous commands utilizing the _history buffer.
- Related Issues
  - https://github.com/stdlib-js/stdlib/issues/1845
  - https://github.com/stdlib-js/stdlib/issues/2069
Support for displaying suggested corrections

This can be a really helpful addition to the REPL, we can provide suggested corrections for cases where an unidentified identifier is entered like an undefined variable, object's property, module or path.
- Outcomes
- Implement a fuzzy suggestion algorithm that suggests similar-looking identifiers when an unknown identifier is entered, instead of just throwing the error.
```
In [1]: base.abbs( -1.0 )
Error: base.abbs is not a function

Perhaps you meant base.abs, base.abs2, ...
```
- Extend this to the help() method similar to julia.
- Approach
- Classify the type of identifier
  
  We can use a regex validator to determine what type of identifier is entered and what type of error is being raised. For example, if it's an unknown variable or an unknown object property.
  
  This implementation is similar to how we currently handle auto-completions. The completer uses regex to classify entered statements into incomplete filesystem, workspace, expressions, require, and tutorial expressions.
- Suggest possible corrections
  
  Once we have classified the type of identifier to suggest, similar to the completer logic that uses a fuzzy completion algorithm (yet to be implemented) to match completions from an AST, filesystem, reserved keywords, and other places depending on the classification, we too can use these to generate set of possible completions.
  
  Although I would arguably use a different algorithm for this as it's a different use case. An algorithm that denotes on how different two strings look might be a better approach to use in this case as corrections mostly occur due to spelling mistakes unlike in code completion where we are generally looking for how much of a prefix the input is to the completion. The Levenshtein algorithm is a popular algorithm that does exactly that.
- Extend this to the help() method using the same logic
Overall algorithm:
- Identify and classify the kind of suggestions needed (filesystem, expressions, require, etc)
- Use the fuzzy Levenshtein algorithm to find similar identifiers from the AST, filesystem, etc.
- Prior art
- stdlib completer
- Julia's REPL help mode
- Related Issues
- https://github.com/stdlib-js/stdlib/issues/2058
Multi-line editing

Currently, the REPL goes into multi-line mode if it detects an incomplete expression. Once we hit ENTER, there is no way to edit the previous line as hitting the up arrow triggers readline's default behaviour of bringing up previous commands. Additionally, we don't have a manual way to enter multi-line mode.
- Outcomes
- Discuss and implement ways to manually enter multi-line mode.
- Implement editing previous lines using the up arrow
- Approach
- Entering manual multi-line mode:
  
  There can be multiple ways we can enter multi-line mode as discussed here:
  - Modifier key combination: In windows SHIFT+ENTER and CTRL+ENTER are recognized the same as enter (I assume due to limitations of the terminal). As mentioned here, modifier keys with enter are not getting recognized on Mac. Solution : a key combination like CTRL+O can be used which should work on most terminal applications and also is somewhat standardized given that IPython uses this same key combination to enable multi-line editing.
  Implementing this is straightforward by listening for keypress events as I did here.
  - .editor command: If modifier keys still seem problematic, we can also take a nodejs-type approach which has a dedicated .editor command that spins up a multiline editing mode. This is pretty straightforward to implement as it involves just writing an internal command that will do everything.
- Implementing multi-line editing:
  
  By default, the readline interface provides the previous commands when the up key is triggered. The legal way would be to handle these keypress events, using the keypress event listener. One problem is the readline interface would still trigger the default operation (clearing the current line and printing the previous command). We can of course manually undo this first and then move the cursor position upwards.
  
  But to simplify this, we can use the private _ttyWrite method instead, which will allow us to process keypress events before they are emitted. Although we shouldn't be using a private method, we have already used it here and can be reused for this purpose.
  
  In an oversimplified way, this is what we are getting at:
  
  Now this just changes the cursor position visually, we still need to update and maintain the line buffers and the command history accordingly.
  
  We can then keep track of the line number to update the entire command so far, roughly something like this:
This a rough roadmap to achieve multiline editing in the REPL.
- Prior art
- Node.js REPL's .editor command: Although it doesn't support going to the previous line
- IPython's CTRL+O key combination
- nano
- Related features that can be added (if time persists):
- Inserting entire command when pressing up/down arrow in REPL: This too requires listening for up/down strokes in the beforeKeypress() listener, overriding the default behaviour and utilizing our internal command _history store.
- A nano type editing mode as discussed here: I presume this can take some time and can be better left as a future concern.
- Related Issues
- https://github.com/stdlib-js/stdlib/issues/2060
- https://github.com/stdlib-js/stdlib/issues/2070
Bracketed-paste

When pasting multiple lines of code into the REPL, as soon as it encounters the newline characters, it executes that statement, this shouldn't be ideal behavior. Bracketed-paste refers to being able to distinguish when the input is pasted code, and handling it differently, ideally allowing the user to edit it before execution.
- Outcomes
- Implement bracketed paste allowing users to paste multiple lines of text without execution, if the terminal supports it.
- Approach
- Utilizing the terminal's bracketed paste mode:
  
  Bracketed paste mode is generally disabled by default, but the terminals that do support it can be turned on by writing an escape sequence, ie. _rli.ostream.write('\x1b[?2004h');.
  
  The terminal then wraps the pasted text with a specific escape sequence, that we can use to identify if the content is pasted. If the content is pasted, we can prevent executing the code if newlines are encountered.
- A hacky implementation: Not recommended.
  
  Another hacky way we can implement this is by overriding the line event of the readline interface to only be triggered when a keypress event with a key value of ENTER is received. When pasting newline characters, I assume the ENTER key value would not be received and we end up not executing the pasted content. If this works, it will work on most terminals even if they don't support bracketed paste mode.
- References
- https://cirw.in/blog/bracketed-paste
- https://github.com/nodejs/node/pull/47150/files
- Related features that can be added (if time persists):
- Formatting pasted code for readability.
- Related Issues
- https://github.com/stdlib-js/stdlib/issues/2068
Pretty-printing of tabular data

There should be a way to visualize data like an array of objects in a tabular form. As a REPL, aiming to emphasize on data analytics, this becomes a crucial feature.
- Outcomes
- Implement package @stdlib/plot/ascii/table.
- Implement table( data[, n] ) REPL command that utilizes the above package
- Approach
- Implementing API for plotting an ASCII table We should be able to parse the given data into rows and columns and can follow any of the prior arts to design the table. We should also parse ndarray as matrices.
- Implementing a table command This is pretty straightforward once the above API is implemented.
- Prior art
- https://github.com/sorensen/ascii-table
- https://github.com/tecfu/tty-table
- Related Issues
- https://github.com/stdlib-js/stdlib/issues/2067
Syntax-highlighting and bracket matching
- Outcomes
- Syntax highlighting
- Bracket pair matching
- Approach
- Syntax highlighting
  
  There are some packages like emphasize that can help us easily implement syntax highlighting. Node's REPL rewrite uses this too for syntax highlighting.
  
  However, if we are trying to avoid dependency overhead we can implement it ourselves, though I am not sure how tedious it can get.
  
  One of the ways we can try implementing this is using the acorn parser to loosely parse the current line (after every keypress) to create our AST of tokens of different types. Then we can traverse the AST and wrap each node with ANSI color codes depending on the type of node. From how much I've tried, I am not sure how good acorn-loose is at parsing incomplete expressions.
  
  We can have 2 themes to begin with: Light and Dark.
  
  When it comes to executing commands or exporting to a file, we can strip the ANSI sequences manually. But it can get tedious, as we will mostly be working with raw text for all operations, including storing history, variables, commands, etc. We just need coloring for display. So, instead, we can do the coloring in the final layer before printing and use raw text everywhere else.
- Bracket pair matching
  
  Bracket pair matching involves highlighting the current code block by highlighting the brackets enclosing the logical block of code the cursor is inside
  
  Although Bracket pair matching can also be achieved using the acorn parser to some extent. it wouldn’t be able to parse all brackets. For example, if I just type () in the terminal, it won't highlight anything.
  
  We can write a simple algorithm inspired from prompt-toolkit (used by IPython).
  
  IPython only highlights the brackets when the cursor is adjacent to the brackets, but we can maybe extend that to keep the brackets enclosing the cursor, highlighted at all times.
  
  A rough sketch of the algorithm:
  
  It works by traversing left and right to the cursor to find enclosing brackets and utilizes a stack to ignore all internal bracket pairs, the found indices can then be highlighted.
- Prior art
- Prompt-toolkit's bracket matching algorithm
- emphasize - Syntax highlighting
- Node's REPL rewrite uses emphasize for syntax highlighting
- Related Issues
- https://github.com/stdlib-js/stdlib/issues/2072
Less/more documentation pager

I am still a bit confused about what type of behavior we have in mind.

Based off this comment I assume we are looking for a less type behavior. And based on this comment, I think it can be a bit complex.

I have a simpler design in mind. based on the height of the terminal window, we will just print the help text that is a bit shorter than that ending with, press CTRL+M to expand, CTRL+X to exit (for example). The help(), would be an infinite process that is interrupted by CTRL+X in this case. Would appreciate some pointers here.
Custom key-bindings support

I am a bit confused about this too, are we talking about allowing the user to configure certain actions? Or just general keybindings support for a lot of common tasks, like CTRL+V for pasting content, etc. Need some direction on this
Tests

The REPL currently lacks test coverage. I would write tests to keep the REPL bug free in the long run.
Documentation and tutorials

After implementing all these features, there certainly comes a need for tutorials to allow easy learning of the REPL. We can also create tutorials for common use cases like handling data for data analysis etc.

We already have our REPL presentation framework in place so implementing this would be easy.
Small additions (optional)

If time permits, we can add these small improvements as well:
- https://github.com/stdlib-js/stdlib/issues/2071
- https://github.com/stdlib-js/stdlib/issues/2066
- https://github.com/stdlib-js/stdlib/issues/2062
- https://github.com/stdlib-js/stdlib/issues/1794 - This RFC is still under discussion, but is still easy to implement as mentioned here

Why this project?

As a JavaScript developer, the NodeJS REPL is lacking in many ways, and there aren't many alternatives out there. This project can positively impact the NodeJS ecosystem, by providing a powerful yet easy-to-use REPL to the community. This excites me about this project and I would love to be a part of this journey!

Qualifications

I have studied JavaScript along with core computer science subjects like object oriented programming, algorithms, operating systems, computer architecture, Linux & git in college. I also fairly understand the REPL codebase to be able to execute on the proposal.

Prior art

Prior arts for specific features are mentioned in the abstract.

Commitment

My summer break from college starts May 15. So, during the coding period (starting May 27), I would be available full-time for around 2 months with no other commitments. I would be able to commit 40+ hrs/week for 2 months. Then with 1 month along with college, I will be able to devote around 20 hrs/week for the remaining month.

I would be able to commit around 400 hours to the program.

Schedule

Assuming a 12 week schedule,

Community Bonding Period:
- Discuss and plan the proposed features in detail to gain more clarity on the goals and approach.
- Once a clear plan is finalized, can even start early as my summer break would begin on May 15.
Week 1:
- Implement fuzzy auto-completion.
- Write tests for the implementation.
Week 2 & 3:
- Implement multi-line editing. This feature can get a bit complex to implement, hence allocating 2 weeks.
- Write tests for the implementation.
Week 4 & 5:
- Implement Syntax highlighting & bracket matching. It depends on how are we going to approach this. If we plan on using an external library, it can be done in a shorter time. but just to be safe it can be given 2 weeks.
- Write tests
Week 6: (midterm):
- Implement suggested corrections and custom keybindings.
- Before midterm, would be done with bulky features like multi-line editing, syntax highlighting, and fuzzy completions
Week 7 & 8:
- Implement less/more documentation pager. As mentioned in the abstract, there is not much clarity about what I have in mind. Assuming it can be complex it's safe to dedicate 2 weeks to this.
Week 9:
- Implement bracketed paste and pretty printing of tabular data. Both are seemingly straightforward to implement.
- Write tests for pretty printing of tabular data
Week 10:
- Complete incomplete work (if any)
- Write tests for new features and the existing REPL features too.
Week 11:
- Finalize tests.
- Write tutorials and documentation.
Week 12:
- Relaxation week to handle pending work, bugs, tests etc.
Final Week: Project submission!

Related issues

1

Feature-specific issues are mentioned in the abstract.

Checklist

[X] I have read and understood the Code of Conduct.
[X] I have read and understood the application materials found in this repository.
[X] I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
[X] I have read and understood the patch requirement which is necessary for my application to be considered for acceptance.
[X] The issue name begins with [RFC]: and succinctly describes your proposal.
[X] I understand that, in order to apply to be a GSoC contributor, I must submit my final application to https://summerofcode.withgoogle.com/ before the submission deadline.

Snehil-Shah commented 7 months ago

@kgryte Would really appreciate some feedback as only a few days are remaining till the deadline.

I also had some doubts regarding less/more documentation pager and custom key bindings as mentioned in this draft proposal, would like some clarity..

kgryte commented 7 months ago

@Snehil-Shah Thanks for sharing a draft of your proposal and for your thorough discussion of the various tasks. A few comments:

Re: less/more. Yes, in short, we'd want to implement something like Linux's less command. When I looked into this a while back, I opted for Linux's more command as the API is much simpler. The main things we'd want to initially support are (1) scrolling/pagination up/down and (2) search. If we got syntax highlighting over the finish line, it would also be nice to syntax highlight repl.txt examples.
Re: custom keybindings. Yes, the idea would be to allow users to custom keyboard shortcuts by mapping common actions to specified keystrokes, similar to how one might configure an IDE to recognize particular keybindings.
fuzzy completions. I think your suggestion of having a separate "display" field makes sense. In which case, I'm in agreement that implementing a custom completer seems reasonable/needed.
Given that some of the trickier things are front-loaded, there is likely to be a need for slightly longer review cycles (e.g., for multi-line editing support). I'm wondering if it would be possible to better interleave smaller quick win tasks with the larger more complex tasks in order to ensure that you are never blocked from working on something at any given point.
Your proposal focuses primarily on the tasks/feature ideas mentioned in the idea issue, but I'm also curious to hear your own ideas for what you think would be interesting to implement in order to make the REPL better.

steff456 commented 7 months ago

Hi @Snehil-Shah thanks for opening this draft proposal!

I see that your proposal is really ambitious and I'm not fully clear if you are expecting to add all of these features to the REPL. I think the scope is too big, specially because just creating a syntax highlighter can take several weeks without testing, and the same goes for the autocompletion idea. I will recommend you to focus on one idea and detail more how and why you think that is interesting as a project for you, I think this aligns with the last comment from @kgryte.

kgryte commented 7 months ago

Building on Stephannie's comment, I think the scope is possible, but it certainly requires many things going right. Auto-closing brackets (see https://github.com/stdlib-js/stdlib/pull/1680), for example, was something that ended up taking about a month to get over the finish line due to scoping, refactoring, addressing corner cases, and review. In order for us to execute on all the various tasks, we'd need to have strong alignment and a strong sense as to how the implementations will work.

And agreed on syntax highlighting. There may be a number of small changes that we'll need to make to the REPL codebase to make this happen, and some of these changes will likely be prerequisite. So having an idea of what changes may need to be made may help with scoping and timeline.

Hence, it may be good to have a list of smaller, very concrete tasks which lend themselves to small, regular PRs, and then some larger, more open-ended tasks which can be done in parallel.

stdlib-js / google-summer-of-code

[RFC]: building a better Node.js REPL #69

Full name

University status

University name

University program

Expected graduation

Short biography

Timezone

Contact details

Platform

Editor

Programming experience

JavaScript experience

Node.js experience

C/Fortran experience

Interest in stdlib

Version control

Contributions to stdlib

Goals

Why this project?

Qualifications

Prior art

Commitment

Schedule

Related issues

1

Checklist