stdlib-js / google-summer-of-code

Google Summer of Code resources.
https://github.com/stdlib-js/stdlib
23 stars 5 forks source link

[RFC]: stdlib API dependency explorer #45

Closed prajwalkulkarni closed 2 months ago

prajwalkulkarni commented 3 months ago

Full name

Prajwal Kulkarni

University status

Graduated

University name

BMS College Of Engineering

University program

No response

Expected graduation

Year of Graduation: 2022

Short biography

I'm Prajwal Kulkarni, a recent grad working as a software engineer at a startup and I'm based out of India. I've completed my undergraduate program majoring in Information Sciences from BMS College Of Engineering, Bengaluru, India. As a software engineer, I mostly work on all things web from implementing seamless UI to ensuring the application/feature is scalable and bug-free. My interest mainly revolves around web development and tooling, with React being the primary library used on a daily basis.

Timezone

UTC +5:30 (India)

Contact details

github: prajwalkulkarni

Platform

Windows

Editor

I mostly use VSCode for almost everything, be it at work or personal projects. The primary reason is the maturity of this code editor which lets us do everything in one place, ranging from the ability to install and use extensions that improve developer experience to sleek UI enabling navigating between different files, folders, and even projects(workspaces) with ease. In addition, VSCode comes with git support out-of-the-box which makes it easier to track the diff visually before committing files or pushing them to remote.

Programming experience

My journey on the internet began as a blogger, back when I was in 9th grade, and used to run a tech blog writing and sharing articles on everything I learned. Initially, I began developing apps for Android, tried my hands-on iOS development and currently pursuing frontend/full-stack web development with React/ExpressJS being the primary library/framework. I've always wanted to(still do) build cool stuff that could impact/be used by lots of people. Whenever I work on my side projects I always look out for the value it would create for the people who would use it.

My latest projects are Take a break(A chrome extension that monitors your screen time and reminds you to take breaks periodically) and blackjack21(A node-based blackjack game on CLI). You can check my other projects on my portfolio website

JavaScript experience

I’ve close to 2 years of experience in JavaScript with more than a year of industry experience. It wouldn’t be wrong If I said JavaScript is both the most hated and most loved language. JS can be used to build anything literally. UI, simulations, mobile apps, servers, games, and whatnot. Having said that, my favorite feature in JS would be the concept of microtasks and Promises which allows the runtime to be concurrent despite being single-threaded and being able to handle tasks asynchronously. And my least favorite feature is how the language handles arithmetic operations of floating point numbers, although I do understand that conversion of floating numbers into double precision 64-bit binary is bound to give this result, it requires extra attention when working with numbers. In fact this absurdity had led to a production bug on one of the features that I recently worked on ( 2.01 * 100 didn’t result in 201 lol)

Node.js experience

I have used Node.js in some of my side projects, and I have also worked on a small feature at work, and I’m actively learning more about it. I’d say I have somewhere between beginner to intermediate experience with Node.js but I’m currently exploring some of the core concepts like buffers, streams, and threads in depth.

C/Fortran experience

I do not have any experience with C or Fortran.

Interest in stdlib

I’d say stdlib is a one-stop-shop for all the math utilities and functions which would help developers perform complex mathematical & scientific operations without having to reinvent the wheel or write things from scratch. One thing that I find particularly interesting about stdlib is how each utility is packaged as a separate dependency making it easier to add only a particular utility package instead of installing the complete library. All in all the library is an invaluable tool for anyone dealing with complex numerical computations in JavaScript due to its adaptability and smooth integration into applications. In addition, the community is also very engaging and helpful proving to be a great source of learning new things from fellow contributors/community-members.

Version control

Yes

Contributions to stdlib

I have made a few contributions to the library primarily focusing on improving the type declaration of some of the utilities.

Merged PRs

Pending

Goals

After browsing through all the ideas, I found several of them interesting, but the one that caught my eye was “stdlib API dependency explorer”. Just as I read through the problem statement, I was almost immediately able to come up with an implementation plan. In this project, I’d like to create a dependency explorer graph as an npm package that could be used in the documentation site to render any package’s dependencies in the form of a graph with each node representing a dependency.

There are 4 steps involved in implementing this solution:

  1. Defining the base dependency mapping
  2. Generating the complete dependency mapping using the base dependency mapping.
  3. Generating Nodes and edges.
  4. Drawing the dependency tree using nodes and edges.

Defining the base dependency mapping

The base dependency is a giant object which is a mapping of each package or a module to an array of its direct dependencies or sub-modules. While it is very certain that a dependency of a package further uses other dependencies, this mapping stores only direct dependencies. If we create a dependency tree that expands up to the leaf dependency, the creation, mutation, and readability of this object becomes very cumbersome. The format of the base dependency mapping would look something like this:

const BASE_DEPENDENCY_MAPPING = { “@stdlib/array/base/accessor”:['@stdlib/utils/define-nonenumerable-read-write-accessor', '@stdlib/utils/define-nonenumerable-read-only-property' , '@stdlib/array/base/accessors', '@stdlib/assert/is-collection','@stdlib/string/format' ],
…
}

The key is the name of the package or a module (e.g.: math) and the value is an array of its direct dependencies or sub-modules(like base for the math module). However, manually writing this mapping is time-consuming and a tedious task. Hence, I propose writing a small nodejs program that runs on the lib/node_modules directory, that creates a package-dependency mapping by traversing all the package directories. There’s already a dependency tree hosted at https://stdlib.io/docs/api/latest/package/tree-array.json I can transform this already available data into the required format, i.e., the key being the name of the package of the module and the value will be an array of sub-modules or direct dependencies.

This utility will be a part of the stdlib project’s tools and can be used to generate the mapping during the build time.

Generating the complete dependency mapping using the base dependency mapping.

The next step is to generate a complete dependency mapping for all the packages/modules. We will essentially be extending the generated result in the previous step to include dependencies of dependencies. This will be a function that iterates through the list of direct dependencies and recursively looks for further dependencies of each direct dependency. The pseudocode would look something like this:

function EXTEND_BASE_DEPENDENCY(paths):

    FOR each packagePath in paths:
        mainFilePath = RESOLVE_MAIN_FILE_PATH(packagePath)  // Get the path to the main file

        IF mainFilePath exists:
            dependencies = EXTRACT_DEPENDENCIES(mainFilePath)  // Extract dependencies from the main file
            baseDependency[packagePath] = dependencies  // Add/Update the packagePath and its dependencies to the mapping

The output of this step will be similar to the output from the previous step, but, with updated properties mapped to their respective properties, if a package is atomic in nature and does not use any dependencies, its value will be an empty array.

Generating Nodes and edges

Drawing a dependency tree requires us to define all the nodes and the combination of edges between the nodes. Now that our dependency mapping is ready, we can use this to generate an array of nodes and edges. Below are the pseudocodes for generating nodes list and edges list.

function addNodes(depList):
    FOR each key of depList:
        if key is not in nodes:
            add key to nodes
            if DEPENDENCY_MAPPING[key] is an array with length > 0:
                recursively call addNodes(DEPENDENCY_MAPPING[key])
    ENDFOR
function traverse(node: Record<string, Array<string>>, parent: string):
    FOR each key in node:
        IF node[key] has children:
            FOR each child in node[key]:
                IF edge from key to child or from child to key does not exist in edges:
                    add edge from key to child to edges
         ENDIF
     ENDFOR
        ENDIF
    ENDFOR

Drawing the dependency tree using nodes and edges.

Once the relevant data is ready, the final step is to draw the tree to the UI. This is a straightforward task that can be accomplished using some available libraries. However, one important thing to note here is that when rendering the nodes to the UI, we must also specify the coordinates of each node so that each node is placed correctly w.r.t its parent node. A new component can be defined in www/src/components/readme and imported to index.jsx to render the graph in a modal. The design mockup for this implementation can be viewed here.

Interactivity & UX: I aim to add some interactivity to the generated graph like navigating to a particular package by clicking the respective node. Render the graph in a modal with a button click. Relevant icons will also be shown to indicate if a node is a package or a module/sub-module.

Frame 5 Frame 6 explanation

Note: For demonstration purposes, the icons are shown only for 2 nodes, but during the implementation, it will be shown for all the nodes.

Since this UI is rendered after a button click, and in addition, the graph also supports user interaction, generating a static asset during the build time might not be the best option, but, instead, we could support SSR that would return a baked HTML/CSS on the button click.

Why this project?

As I mentioned in the earlier section, several projects were interesting, but this project particularly caught my eye for several reasons

First off, building a dependency explorer for different stdlib packages is a worthwhile project since it shows the complex dependencies and relationships between distinct parts visually. Comprehending these dependencies is essential for developers utilizing stdlib, as it facilitates improved project administration and code optimization.

There's an additional level of interest while working on an exploration tool based on graphs. Graphs provide a clear and insightful means of understanding intricate dependencies. Developers can have a better grasp of the library's structure and spot possible areas for optimization or enhancement by identifying the relationships between stdlib packages.

Moreover, this initiative also fits in with the larger trend of improving developer experience and tooling. A dependency explorer can help troubleshoot dependencies-related issues, greatly enhance the onboarding process for new engineers/developers, and make projects more maintainable overall.

In conclusion, I find it interesting to be able to contribute to a project that aims to create a dependency explorer for stdlib packages since it blends the practical aspects of enhancing developer processes with the difficult task of producing a useful and aesthetically pleasing tool.

Qualifications

I believe I am well-equipped to work on this project as my skills and expertise align with the project's requirements. I have industry experience working on all the tools and technologies required to implement this project, including HTML/CSS, JavaScript, TypeScript, React/JSX, etc.

Apart from the core technologies, I also have experience with other tools like git, VSCode, and GitHub Actions which is necessary to implement the solution effectively.

Further, as a proof of concept, I’ve also done a vague implementation of this project. You may check the demo here

Prior art

Upon investigating the project's goals, I have discovered other cases in which comparable objectives were accomplished with different methods and tools. The API dependency explorer is very similar to the dependency tree generated as a lock file in node-based projects.

I thoroughly reviewed relevant articles, forums, and community discussions to obtain insights into current implementations. I came across some articles and blog posts that explore related projects and provide insight into different approaches, optimal procedures, and possible pitfalls. The visualization structure was inspired by this article which is based on this tool.

These materials, though not exactly alike, have been shown to help shape the design decisions and approaches taken on related projects. They provide an extensive amount of information, enabling a better informed and comprehensive approach to the current project.

Commitment

As I’m working as a full-time software engineer, I will be dedicating about an hour or two on weekdays and I will be able to commit 7-8 hours over the weekends, so that will account for an average of ~20 hours/week. There are no other commitments during the GSoC period, however, if any unforeseen circumstances arise, I shall communicate the same to the mentors in prior and adjust my timeline accordingly to meet the deadlines on time.

Schedule

During the Community Bonding period, I intend to:

Phase 1

The goal of Phase 1 is to implement a program to generate base dependency mapping, complete dependency mapping and, write tests for the same.

Week 1 (May 27 - Jun 2)

Week 2 & 3 (Jun 3 - Jun 16)

Week 4 (Jun 17 - Jun 23)

Phase 2

The goal of Phase 2 is to write utility functions, integrate different parts & write tests. PRs will be sent for review regularly.

Week 5 (Jun 24 - Jun 30)

Week 6 & 7(Jul 1 - Jul 14)

Week 8 (Jul 15 - Jul 21)

Phase 3 - Final Phase

The goal of the final phase is to improve the layout algorithm, perform dev/sanity testing, and, complete the project. PRs will be sent for review regularly.

Week 9 & 10(Jul 22 - Aug 5)

Week 11 (Aug 6 - Aug 11)

Week 12 (Aug 12 - Aug 18)

Notes:

Notes:

Related issues

No response

Checklist

kgryte commented 3 months ago

A few comments:

  1. We will not need to publish a separate package to npm just to handle exploring dependencies. We're more likely to include it as part of project "tools". See https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/_tools.
  2. Currently, when building the documentation website, we generate static assets for each package. For example, see https://github.com/stdlib-js/www/tree/master/public/docs/api/latest/%40stdlib/array/buffer. We're more likely to simply generate a static dependency tree during the documentation build process. Accordingly, rather than querying a database or S3, we'd simply request a static file from API docs server (note: the docs server may need to be updated to support a dedicated route for retrieving this asset).
  3. Will the dependency explorer for each package have a dedicated URL?
  4. What types of UI interactions do you anticipate supporting? E.g., beyond pan/zoom, will tree nodes be collapsible? will tree nodes be links out to package documentation?
  5. Do you have any wireframes or mockups to help convey your vision for the explorer?
prajwalkulkarni commented 3 months ago

Thank you for the suggestions. I've reiterated my proposal and have made changes accordingly.

We will not need to publish a separate package to npm just to handle exploring dependencies. We're more likely to include it as part of project "tools". See https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/_tools.

Yes, we can define the utilities to generate the dependency mapping by including them as "tools".

Currently, when building the documentation website, we generate static assets for each package. For example, see https://github.com/stdlib-js/www/tree/master/public/docs/api/latest/%40stdlib/array/buffer. We're more likely to simply generate a static dependency tree during the documentation build process. Accordingly, rather than querying a database or S3, we'd simply request a static file from API docs server (note: the docs server may need to be updated to support a dedicated route for retrieving this asset).

The graph will not be shown on the screen by default, instead, the user can click on a button and then see the tree in a modal. Hence I don't think it would be possible to generate static assets during the build time. However, we can add support for SSR that would return baked HTML/CSS on button click.

Will the dependency explorer for each package have a dedicated URL?

As we will most likely be taking SSR approach, I don't think there would be a dedicated URL for each package

What types of UI interactions do you anticipate supporting? E.g., beyond pan/zoom, will tree nodes be collapsible? will tree nodes be links out to package documentation?

Beyond pan/zoom, the user will be able to navigate to the documentation of the particular dependency by clicking on the node. Collapsing/Expanding is a nice feature that could be supported, but I have not yet ideated on how to go about it. I would like to keep this at the end or maybe implement it post the GSoC period.

Do you have any wireframes or mockups to help convey your vision for the explorer?

Yes, I have created a mockup. You can check it here

kgryte commented 3 months ago

@prajwalkulkarni I'm still not understanding why a modal is better than an integrated view similar to how one can view tests and benchmarks (e.g., https://stdlib.io/docs/api/latest/@stdlib/assert/contains/tests). If we use a modal, as shown in your mockup, users must first close the modal before being able to use the side navigation pane. And further, as mentioned previously, a modal kind of precludes having a static URL for accessing the dependency graph. In general, my preference would be to use consistent design patterns, and currently, the only modal we support is for settings, which makes more sense.

kgryte commented 3 months ago

I would also think that the graph should be accessible from the top-navigation bar, rather than on the page, as displayed in your mockup.

prajwalkulkarni commented 3 months ago

Since the dependency graph is not a primary piece of information to show when a user visits a documentation page, I recommended displaying the graph on a modal only when the user specifically requests it by clicking on a button. This would ensure there is no cognitive overload on the user.

Alternatively, I'm ok with showing the graph on a separate dedicated route similar to how one can view tests and benchmarks.

I would also think that the graph should be accessible from the top-navigation bar, rather than on the page, as displayed in your mockup.

Works for me, I've updated both the design and, the proposal

steff456 commented 3 months ago

Hi @prajwalkulkarni, thanks for opening your draft proposal!

I have a couple of questions for you,

  1. How are you going to show the tree when there's more than 10 entries in the same level?
  2. After a new package is created, how does the dependency tree will be updated?

I think one of the most complex parts of this project is to generate a graph that is readable and not overwhelming to the users that see it.

prajwalkulkarni commented 3 months ago

Hello @steff456. Thank you for reviewing my proposal.

How are you going to show the tree when there's more than 10 entries in the same level?

Nodes in the same depth are rendered adjacently, meaning if a module or a package has n direct dependencies or sub-modules the n nodes are placed adjacently. For modules having a large number of sub-modules as their direct children (e.g.: math/base/assert) a large number of nodes would be rendered adjacently which would require the user to scroll horizontally to view all the nodes.

After a new package is created, how will the dependency tree be updated?

Just as different parts of the documentation site are built by executing programs located at https://github.com/stdlib-js/www/blob/master/tools/scripts/api-docs/, a similar program will be executed (mentioned in step 1 & 2 of my proposal) during the build time, which will write the updated dependency mapping to a given path. Thus, whenever a new package is added, we can simply run this program to generate an updated dependency mapping. Further, a small shell script can be written to ease the process.

I think one of the most complex parts of this project is to generate a graph that is readable and not overwhelming to the users that see it.

Agreed. That is why I'm actively looking into ways to improve the layout-ing algorithm and have dedicated 1-2 weeks in the project schedule for the same, and as Athan pointed out above, we can also add support for collapsing and expanding sub-trees thereby making it less overwhelming. Although I'm interested in adding this feature, I have not mentioned it in my proposal as my primary goal is to get the basic functionalities running, after which I can estimate the efforts and work on it, even if it falls outside the GSoC scope.