stdlib-js / google-summer-of-code

Google Summer of Code resources.
https://github.com/stdlib-js/stdlib
26 stars 7 forks source link

[RFC]: achieve feature parity with builtin Node.js `fs` module #72

Closed Daniel777y closed 5 months ago

Daniel777y commented 6 months ago

Full name

Dexu (Daniel) Yu

University status

Yes

University name

Northeastern University

University program

Computer Science

Expected graduation

2025

Short biography

I am currently pursuing a Master's degree in Computer Science at the Oakland campus of Northeastern University, having previously earned a Bachelor's degree in Software Engineering. My technical skill set encompasses programming languages such as C/C++, JavaScript, and Python. Additionally, I possess a strong foundation in Docker, Linux, MySQL, and Firebase.

During my undergraduate and graduate, I took various courses in the field of computer science, such as Data Structures, Operating System, Programming Design Paradigm, Web Development, and Software Test.

My passion for problem-solving has drawn me to competitive programming, where I have honed my abilities in algorithms and optimization. Moreover, I feel a great sense of achievement in developing personal applications and working in group projects, which allow me to bring my innovative ideas to life.

Timezone

US Pacific Time

Contact details

GitHub: Daniel777y, e-mail: yu.dex@northeastern.edu, DanielYu3790@gmail.com

Platform

Mac

Editor

Vim and tmux are my favorite choices for coding. The best thing about Vim is that I can start coding on any computer with just a few settings. Once I got the hang of its shortcuts and how it works, I found I could code much more efficient. When I'm working on bigger projects, I also use VSCode because it has a lot of plugins and helps me manage files better, making everything smoother.

Programming experience

My programming experience includes competitive programming, personal and group projects, and so on, covering a wide range of technologies, such as React.js, Vite.js, Vue.js, Node.js, Django, MySQL, C/C++, Bootstrap, Tailwind CSS. Here're some of my recent projects:

JavaScript experience

I use JavaScript intensively to develop various full-stack applications, for course work as well as personal projects. Recently, I also started contributing to stdlib by implementing the math/base/tools/normhermitepolyf package for evaluating a normalized Hermite polynomial using single-precision floating-point arithmetic.

I really appreciate JavaScript's ease of learning, flexibility and widespread adoption. It allows me to organize code in traditional OOP style or functional programming. Moreover, it streamlines and simplifies the development process significantly. For instance, I can craft user interfaces with React.js or Vue.js, and develop server-side applications with Express.js.

However, one limitation of JavaScript is its performance in computation-intensive tasks. While it's a popular choice for web development, for tasks requiring heavy computation, such as data analysis or machine learning, developers might prefer Python or R. Imagine if we can do such tasks on a browser, that would be exciting!

Node.js experience

In my full-stack projects, I usually utilize Node.js and Express.js for backend development. This includes tasks like database connectivity, API implementation, and file management among others.

C/Fortran experience

I learned C in my freshman year of undergraduate, and I have applied it in multiple course projects, including developing a library management tool and file system in Linux. Beyond these applications, I have been using C/C++ in competitive programming contests for over five years, which has helped me build a strong foundation in this language.

Interest in stdlib

When I first delved into competitive programming, the majority of participants favored C/C++ and Java. However, in recent years, more and more people start using Python, particularly with libraries like NumPy for computational tasks, while the use of Java has significantly dwindled and even C/C++'s dominance has seen a decline.

As I've mentioned, JavaScript's popularity in web development is undeniable, offering ease of implementation for ideas and product demonstrations. However, for data retrieval and analysis, developers still turn to alternative languages. Can JavaScript go further?

Stdlib attracts me because it is enhancing JavaScript's flexibility and capability, such as numerical and scientific computation and other functionality. This expansion not only broadens JavaScript's applicability but also shows the potential for intensive computing tasks to be executed directly in the browser. Especially with the rising importance of machine learning and data science, I believe there will be more and more innovative applications built with JavaScript, necessitating robust libraries like stdlib to support these advancements.

Version control

Yes

Contributions to stdlib

Pull Request

Issue

My first contribution is implementing the single precision equivalent for math/base/tools/normhermitepoly.

Though not aligned with the project I am proposing, this experience has given me a good understanding of the community's standards and the development process.

Also, The task of reimplementing single-precision functions shares similarities with the work involved in implementing the fs module, as both tasks are guided by a related overarching approach.

Goals

The primary goal of this project is to achieve complete feature parity with the Node.js fs module, thereby providing users with a full set of file system operations within stdlib.

Additionally, this project will enhances compatibility with older versions of Node.js. Therefore, developers, even if they use older Node.js versions, can access and utilize new file management features through stdlib.

Moreover, I will implement some of Promise-APIs for these functionalities, which will be beneficial for developers who prefer using Promises over callbacks.

The successful implementation of this project is expected to significantly enhance stdlib's flexibility and utility.

Functionality

Here're some of functionalities I am planning to implement (asynchronous versions):

Besides asynchronous functions, I will also implement synchronous versions of them.

Other than these functionalities, I will also implement utility functions to polyfill the older versions of Node.js.

Compatibility

Currently, stdlib is compatible with Node.js v0.10 and above. Therefore, to maintain compatibility with older versions of Node.js, I need to provide polyfills for some functionalities. To do this, I can borrow ideas from readable-stream. Here's an general example:

const fs = require('fs');

function cp( src, dest, options, callback ) {
    if ( arguments.length < 4 ) {
        callback = options;
        options = {};
    }

    // Check if the current Node.js version supports fs.cp
    if ( fs.cp ) {
        fs.cp( src, dest, options, callback );
    } else {
        // Polyfill
        // ...
    }
}

module.exports = cp;

In this case, I might need to manually implement some helper functions and handle the various options.

Promise

stdlib also plans to provide Promise-APIs in the long run. Therefore, I will try to also implement Promise versions of these functionalities. But I will first focus on the callback and synchronous versions, then move to @stdlib/fs/promise/* later. I need to polyfill the older versions not supporting native Promise as well. To do this, I can borrow ideas from promise-polyfill and implement the @stdlib/promise.

Performance

To ensure correctness and performance, every implementation will be tested through Tape framework and benchmarked in TAP format.

All the functionalities will provide concise error messages, and handle potiential edge cases, such as invalid path, permission denied, and so on.

Documentation and Examples

I will adhere to the development guidelines, offering comprehensive examples and documentation for each function to help users understand their usage and support developers in code maintenance.

Why this project?

File management is a core operation for developers, and stdlib focuses on numerical and scientific computing, making file system crucial for handling data files. By contributing to this project, I will be enhancing the capabilities for high-performance applications that run in browsers with stdlib, which I find immensely exciting.

Additionally, this experience will deepen my understanding of JavaScript and Node.js. While I have previously worked with file systems within various frameworks, engaging with this project will provide me with a profound comprehension the mechanics of file system and other functionalities.

As a freshman in open-source, this project presents a valuable opportunity to contribute meaningfully to the real world and to learn best practices in software development. Working on the math/base/tools/normhermitepolyf was an eye-opening experience for me. I was particularly struck by the the detailed coding standards, the structured development cycles, and the thorough testing norms. It's exciting to anticipate a long-term involvement with this community.

Finally, being part of stdlib's expanding community of both contributors and users is motivational and an honor. I am eager to make a significant impact on the development of excellent applications and to collaborate with dedicated mentors and fellow contributors.

Qualifications

Prior art

In this project, I will implement fs package features in Node.js , so I'll mainly use the Node.js source code as a guide.

Here're some extra packages I might refer to:

Additionally, I might borrow approaches from file systems in other languages, like Python's PyFilesystem.

Commitment

My semester ends on April 29th, so I’ll be completely free to work on this project from May to August. Since I don't have any other occupations, I can put in more than 30 hours each week during the summer. After the GSoC program ends, I plan to keep contributing in the community for about 10 hours a week.

Schedule

Assuming a 12 week schedule with extra 4 weeks.

Notes:

Potential risks

This potencial risks or obstacles are about the scope of work and timeline. Like mentioned previously, I will need to implement some utility functions to ensure the backward compatibility, handle functionalities with options, and try to provide Promise-APIs, so I need to balance priorities and difficulty of implementation. But the bright side is that there are many references and examples in other packages, and I am also free to extend the timeline to 16 weeks to ensure I have enough time to learn and implement the features.

Related issues

#10 [Idea]: achieve feature parity with builtin Node.js fs module

Checklist

Daniel777y commented 6 months ago

Hello, @kgryte @Planeshifter @Pranavchiku.

This draft proposal comes a bit late but I'm eager for the chance to contribute to this community and learn from its exceptional members. Any feedback or suggestions you might have would be greatly appreciated. Thank you.

Planeshifter commented 6 months ago

Thanks @Daniel777y for your proposal and desire to contribute to stdlib!

This is definitely an area where it would be good to make progress. A few comments and suggestions for strengthening the proposal:

kgryte commented 6 months ago

Thanks for working on this proposal. One follow-up question I have is

Ideally, any fs API we provide should work across all versions of Node.js that we support. And thus, for Node.js versions with missing functionality, we'd need to provide polyfills. And this could potentially be quite involved, and, if so, could affect your project timeline.

Daniel777y commented 6 months ago

@kgryte Yes, thanks for response. As you mentioned, I do need to consider the backward compatibility.

I walked through the implementation of readable-stream and tried to understand how it works. For example, the isReadable function. What it does is like:

var isReadable = require('stream').isReadable || require('readable-stream').isReadable;

That is, if the native stream has the isReadable function or the readable-stream is disable for some reasons, we will use it from stream; otherwise, use the one from readable-stream. So I might have to implement some functionalities manually, such as cp, instead of including it from native fs.

In is isReadable case, its implementations is:

function isReadable(stream) {
  if (stream && stream[kIsReadable] != null) return stream[kIsReadable]
  if (typeof (stream === null || stream === undefined ? undefined : stream.readable) !== 'boolean') return null
  if (isDestroyed(stream)) return false
  return isReadableNodeStream(stream) && stream.readable && !isReadableFinished(stream)
}

isDestroyed, isReadableNodeStream, and isReadableFinished are manually-implemented utils functions as well.

Do you think this idea is enough resolve the backward compatibility issue in stdlib?

As for the complexity, I think I can extend my timeline to 16 weeks to ensure I have enough time to learn and implement the features.

I guessed that the graceful-fs also provides polyfill across different versions, but unfortunately it seems not.

kgryte commented 6 months ago

Yes, potentially. readable-stream is a similar idea, but arguably more complex than is necessary. I would anticipate needing to manually implement in a number of cases. For those APIs having many options, could be a bit of a slog to polyfill and ensure adequate testing.

Daniel777y commented 6 months ago

@Planeshifter Thanks for your feedback!

Yes, I also walked through graceful-fs and fs-extra to see how they implement fs functionalities, while some other modules, like fs-minipass and chokidar, also provide references for specific features in fs. I will discuss in detail with mentors about priorities and decide what extra features to implement and the order of implementation.

As for Promise-APIs, indeed it would be good to provide mordern Promise style in stdlib, while I noticed that currently stdlib provides sync/async APIs. If Promise-APIs are needed, for those support Promise, I think they can be implemented like sync/async APIs. For those not support Promise, I can implement async first, then "universalify" them to Promise style, like what fs-extra and universalify do.

As an example, for the rename function, I can do something like:

var rename = require( '@stdlib/fs/rename' );

function universalify( rename ) {
    return Object.defineProperty( function ( ...args ) {
        if ( typeof args[args.length - 1] === 'function' ) {
            rename.apply(this, args);
        } else {
            return new Promise( ( resolve, reject ) => {
                args.push( (err, res ) => ( err != null ) ? reject( err ) : resolve( res ) );
                rename.apply( this, args );
            } )
        }
    }, 'name', { value: fn.name } )
}

var universalRename = universalify( rename );

That is, when users rename a file asynchronously, if they pass a callback function, it will use the callback; otherwise, it will return a Promise:

universalRename( './beep/boop.txt', './beep/foo.txt', done );

// or

universalRename( './beep/boop.txt', './beep/foo.txt' ).then( done );

This is a general idea, and do you think this is good enough for Promise-APIs? I can universalify existing APIs in stdlib to Promise style in the coming days to give it a try. If the workload of Promise-APIs are potentially time-consuming, maybe we can divide them into sub-projects, and I would love to continue working on them after GSoC.

This potencial risks or obstacles are about the scope of work and timeline. Like mentioned previously, I will need to implement some utility functions to ensure the backward compatibility, handle functionalities with options, and try to provide Promise-APIs, so I need to balance priorities and difficulty of implementation. But the bright side is that there are many references and examples in other packages, and I am also free to extend the timeline to 16 weeks to ensure I have enough time to learn and implement the features.

As for correctness and performance, every implementation will be tested through Tape framework and benchmarked in TAP format. All the functionalities will provide concise error messages, and polyfill across older Node.js versions. Another concern is that, one emphasis of stdlib is scientific computing, so I suppose it would process large or multiple files. Do you think it is necessary to handle some edge cases to avoid potential crashes?

kgryte commented 6 months ago

I'd advocate for providing dedicated promise APIs and not the pattern of "if no callback, return a promise". That would fundamentally change error handling for legitimate use cases where a user intentionally does not provide a callback.

kgryte commented 6 months ago

In general, I'd focus first on callback APIs. Then move to promise APIs (e.g. @stdlib/fs/promise/*). For the promise APIs, the main prerequisite is that we need to create @stdlib/promise/ctor with a polyfill fallback for older environments not having native Promise support.

Daniel777y commented 6 months ago

@kgryte Thank you very much for your suggestions! This way the stdlib code will be easier to reuse and maintain. I suppose @stdlib/promise/* can potentially occupy some time slots, but I'd love to give it a try. I found promise-polyfill for reference.