stdlib-js / google-summer-of-code

Google Summer of Code resources.
https://github.com/stdlib-js/stdlib
23 stars 5 forks source link

[RFC]: build a developer dashboard for tracking repository build status #67

Closed naveen1m closed 2 months ago

naveen1m commented 3 months ago

Full name

Naveen Kumar

University status

Yes

University name

Siksha O Anusandhan

University program

Bachelor of Technology in Computer Science and Information Technology

Expected graduation

2025

Short biography

I am a third-year B.Tech. Undergraduate at Siksha O Anusandhan, Bhubaneswar, pursuing Computer Science and Information Technology as my major. I also have a certificate in the Applied Machine Learning program from my college.

I have a strong passion for web application development, and thus I have been actively involved in such projects for the past two years. I have a good command of JS and web development, particularly in the MERN stack. I have participated in a symposium organized at my college, where I submitted a research-based project called Predicting Hospital Acquired Infection (HAI). For this, my paper was in the top 20. Further, I was among the 106 students selected from all over India for contributing to C4GT, an open source program. I developed the npm package in JS, utilizing the Bhasini API for this project. I completed the project within the given timeline. I have also done freelance work on Upwork, where I built an Android mobile application named YourRadio, a social media platform for my client using React Native and Node.js. 

My interest in computer science, particularly web development, took root during my intermediate days in school. Since then, I have been working hard to grow this interest into a full-fledged experience. I see this opportunity as a stepping stone in my professional growth.

Timezone

Asia/Kolkata UTC +0530

Contact details

email: navstr10@gmail.com , GitHub: https://github.com/naveen1m , Gitter: @naveen1m:gitter.im

Platform

Mac

Editor

VSCode. This is simple and lightweight, supports almost all programming languages. It has also numerous extension which enhance its functionality even further.

Programming experience

I have more than two years of experience in programming. My first project was Image-to-Ascii-Art in Java, and after this I have built many projects in the field of web development. Projects include frontend using HTML, CSS, and JS, as well as backend using Node.js, Express, and MongoDB, and a few are full-stack; some of them are blog app, which are deployed as well; mern-auth; and one mobile app, yourradio, using React Native. I also have a few projects where I have written code for reactjs and API code in FastAPI to connect ML models; these are HAI, where I also developed ML models to predict HAI, and Namami-gange-guide. During the C4GT open-source programme, I created a npm package in JS, utilizing the Bhashini API, and wrote documents in JSDoc for this.

JavaScript experience

I have been working on JavaScript since 1st year of college. I learnt it from youtube and built some fundamental projects and intermediate level projects on it. view all vanilla-js projects here, It includes digital-clock, calculator, food-app(using api) and more. I love this language because I can use it in frontend and backend on the web as well as in mobile app development. It has also many libraries and large community

My most favorite features includes:

My Least favorite features includes:

Node.js experience

I learnt node js before React js and used it with library Express js. Most of the time I used mongodb as a database, but recently I used postgreSQL as a database using PG library to develop dashboard-demo. I have written server code for a blog app, Mern-auth and for my freelance project YourRadio.

C/Fortran experience

I read C programming during my 6th semester. Thereafter I solved a few dsa problems on it. This course of basic algorithm implementation strengthened my core concepts in C.

Interest in stdlib

Stdlib is a project which is pushing the boundaries of what JavaScript can do and above that it is a community driven initiative. Who won’t be excited to be a part of such projects? Moreover, It's not just about writing code, it's about innovation, it's about finding creative solutions to real world problems. This is exactly what drives me. The opportunity to work under Athen Reigns also fills me with excitement. The way Athen Reigns has taken stdlib from a personal project to a position where it has over 2000 functionalities in less than a decade inspires and motivates me. It will be an honor to contribute to this project under him.

Also I have been looking for an Open source organization whose requirements align with my skill and interest, thanks to GSOC my search came to an end here.

Version control

Yes

Contributions to stdlib

Goals

Project goals: Develop a Node.js backend to query a PostgreSQL database, and construct a frontend interface using React.js along with other technologies, incorporating the following features:

Features:

  1. Repository list view: List of all repositories under the stdlib project, with pagination or lazy loading to handle the large number of repositories efficiently.
  2. Filtering and Search: Filtering and search capabilities to quickly find specific repositories by name, description, build status or other metadata.
  3. Visual build Status indicator: Visual indicators to represent the latest build status of each repository at a glance.
  4. Build history and trends: Display historical build data for each repository, allowing developers to view and analyze past build failures and trends over time.
  5. Access resources and build artifacts: Easy access to repository resources and build artifacts for seamless navigation and utilization.

Proposed Features:

  1. Interactive Data Visualization: Incorporate interactive charts or graphs to drill down into specific data points or time periods. Generate reports and analytics on build statuses, failure rates, and other relevant metrics across the stdlib ecosystem. [ as said in issue to extend ]
  2. Alert and Notifications: Configurable notification preferences based on repository, severity. Alert in email or website (toast) about critical build failures or issues.

Technology Stack

Frontend

React + vite (javascript)
vite(react) is known for its incredibly fast development server. It only rebuilds the parts of the application that have changed, resulting in faster reload during development time.
react-router-dom It provides routing capabilities for React applications, allowing to create single-page applications with multiple views and navigation without full page reloads.
Tailwind CSS Tailwind CSS is a utility-first CSS framework which provides a set of pre-designed utility classes. This helps in building UI faster.
React virtualized This is used for displaying large lists of data in tables with headers and scrolls efficiently.
axios It provides a simple and consistent interface for making HTTP requests from Node.js, offering features like automatic JSON data transformation and request/response interceptors.

Backend

Express.js
Express.js is a popular, lightweight and flexible web application framework for building server-side applications in Node.js.
pg node-postgres, or pg, is a nonblocking PostgreSQL client for Node.js. Essentially, node-postgres is a collection of Node.js modules for interfacing with a PostgreSQL database.

Testing

Jest
Jest is a testing framework for JavaScript used to test code written in Node. js, Reactjs.  It has built-in test runner that can be used to run tests in parallel, which makes testing faster.

Implementation Details Workflow

  1. UI/UX design
  2. API design and build backend on nodejs for querying postgreSQL database
  3. Dashboard frontend in reactjs which interface with backends
  4. Testing to ensure application work as expected
  5. Deployment

See database schema, designed to understand the provided schema and write a query on it.

UI/UX design Note: The design prototypes shared here will not be the final design for the end product. Elements will be added, and the design will be modified based on the mentor’s suggestions. as per the suggestions of the mentor. Under his guidance, the UI will be finalized during the first week.

Figma design screenshots, I learned it that's why it took me some time to upload the design. Index page This is the first page that users will see upon visiting the stdlib-developer-dashboard domain. The page contains an infinite scrolling table containing information about various repositories managed by the stdlib organization.

dashboard dashboard-page

dashboard-page on x-y scroll dashboard-page on x-y scroll

This is the analytics page, to analyze the build status, downloads, PR merged over a period of time. analytic-page analytics-page

analytics page bottom view analytic-page bottom view

Features Implementaion

1. Displaying data on dashboard Data on the dashboard will be displayed first according to the build status allowing maintainers and developers to identify which repositories need support and which one have the latest build.

Users will be able to view updates of build artifacts in a single page through infinite scrolling, sorting and searching. Steps involved in displaying the dashboard: Since stdlib has more than 4k repositories, fetching all of them at once from the backend would not be ideal. Therefore, to display such a large number of rows, cursor-based infinite scrolling will be utilized which will only fetch data that needs to be displayed on the screen at that time. I have designed a data schema to visualize the database and to utilize it while writing PostgreSQL queries.

Sample code I have read, understand and will follow stdlib style-guides during code writing.

Step1: retrieving data from database I will utilize the PG library with Node.js to fetch limited data at a time.

const query = `
      SELECT
          r.name,
          t.tag,
          n_p.version,
          n_p.node_version,
          n_p.published_at,
          n_p.tarball_size,
          n_p.license,
          n_r_v_d_c.count AS downloads,
          w_r.status,
          w_r.run_number,
          w_r.run_attempt,
          r.owner,
          EXTRACT( EPOCH FROM ( w_j.started_at - w_j.completed_at )) AS duration

      FROM
          stdlib_github.repository r
          FULL JOIN stdlib_github.tag t ON r.id = t.repository_id
          FULL JOIN stdlib_github.npm_publish n_p ON r.id = n_p.repository_id
          FULL JOIN stdlib_github.npm_rolling_version_download_count n_r_v_d_c ON r.id = n_r_v_d_c.repository_id
          FULL JOIN stdlib_github.workflow_run w_r ON r.id = w_r.repository_id
          FULL JOIN stdlib_github.workflow_job w_j ON r.id = w_j.repository_id
      ORDER BY
          w_r.status, r.id
      LIMIT
          $1
      OFFSET
          $2

    `;

Step2: Sending retrieved data to frontend

app.get( '/api/v1/repository-data', async ( req, res ) => {
  const { pagesize, page } = req.query;
  const offset = ( page - 1 ) * pagesize;

    const { rows } = await client.query( query, [ pagesize, offset ]);

    // Send the query result as JSON response
    res.json( rows );

});

Response skelton

{
Success: true,
Data: [
name: string,
tag : string,
Version: string,
—----
—---
—---
]
}

Step3: fetching data from the Backend and displaying it on the Frontend React.js and library react-virtualized will be utilized for this. Axios will be used for http requests.

fetching data using axios

useEffect(()=>{
 const fetchData = async () =>{
 const response = await axios.get( API_URL, {
        params: { pageSize, page }
      });
      const newData = response.data;
      setData( prevData => prevData.concat( newData ));
 }
fetchData();
},[page])

displaying rows using react-virtualized grid

import { AutoSizer, InfiniteLoader, Grid } from 'react-virtualized';
 const cellRenderer = ({ columnIndex, key, rowIndex, style }) => {
    const repository = repositories[ rowIndex ];
    let content = '';
    switch ( columnIndex ) {
      case 0:
        content = repository.name;
        break;
      case 1:
        content = repository.tag;
        break;
      // Add cases for other columns
      default:
        content = '';
    }
    return (
      <div key={ key } style={ style }>
        { content }
      </div>
    );
  };

main function

return (
    <InfiniteLoader
      isRowLoaded={() => !isLoading || repositories.length >= PAGE_SIZE * page }
      loadMoreRows={ loadMoreRows }
      rowCount={ repositories.length + 1 }
    >
      {({ onRowsRendered, registerChild }) => (
        <AutoSizer>
            <Grid
              ref={ registerChild }
              onSectionRendered={ onRowsRendered }
              cellRenderer={ cellRenderer }
              columnCount={ 11 } 
              columnWidth={ 30 } 
              height={ 300 }
              rowCount={ repositories.length }
              rowHeight={ 50 } 
              width={ width }
            />
        </AutoSizer>
      )}
    </InfiniteLoader>
  );

2. Sorting We can apply sorting in ascending or descending orders for all columns. postgreSQL query will be used for this.

routes:- GET : /api/v1/sort

body parameter : { column, direction }

const sortQuery = `SELECT 
    r.name,
    t.tag,
    n_p.version,
    n_p.node_version,
    n_p.published_at,
    n_p.tarball_size,
    n_p.license,
    n_r_v_d_c.count AS downloads,
    w_r.status,
    w_r.run_number,
    w_r.run_attempt,
    r.owner,
    EXTRACT(EPOCH FROM (w_j.started_at - w_j.completed_at)) AS duration
FROM
    stdlib_github.repository r
FULL JOIN 
    stdlib_github.tag t ON r.id = t.repository_id
FULL JOIN 
    stdlib_github.npm_publish n_p ON r.id = n_p.repository_id
FULL JOIN 
    stdlib_github.npm_rolling_version_download_count n_r_v_d_c ON r.id = n_r_v_d_c.repository_id
FULL JOIN 
    stdlib_github.workflow_run w_r ON r.id = w_r.repository_id
FULL JOIN 
    stdlib_github.workflow_job w_j ON r.id = w_r.repository_id
ORDER BY 
-- TEXT datatype
CASE $1
        WHEN 'name' THEN r.name 
        WHEN 'tag' THEN t.tag 
        WHEN 'version' THEN n_p.version 
        WHEN 'node_version' THEN n_p.node_version 
        WHEN 'license' THEN n_p.license 
        WHEN 'status' THEN w_r.status 
    WHEN 'owner' THEN r.owner
 END $2,   

 -- timestamp datatype
CASE $1   
        WHEN 'published_at' THEN n_p.published_at 
END $2,

-- double precision datatype 
CASE $1
        WHEN 'downloads' THEN n_r_v_d_c.count   
END $2,

-- integer datatype
 CASE $1    
        WHEN 'run_number' THEN w_r.run_number 
        WHEN 'run_attempt' THEN w_r.run_attempt 
 END $2,

 -- bigint datatype
CASE $1
        WHEN 'tarball_size' THEN n_p.tarball_size 
END $2,

-- numeric datatype
CASE $1
        WHEN 'duration' THEN EXTRACT(EPOCH FROM (w_j.started_at - w_j.completed_at)) 
END $2,

w_r.status -- Default sorting column   
`

3. Searching For the v1 of the dashboard, we can apply search only in name, tags, and build_status fields. We can extend it later using PostgreSQL indexing.

routes:- GET : /api/v1/search

parameter: { q: query, column }

const searchQuery = `SELECT 
    r.name,
    t.tag,
    n_p.version,
    n_p.node_version,
    n_p.published_at,
    n_p.tarball_size,
    n_p.license,
    n_r_v_d_c.count AS downloads,
    w_r.status,
    w_r.run_number,
    w_r.run_attempt,
    r.owner,
    EXTRACT(EPOCH FROM ( w_j.started_at - w_j.completed_at )) AS duration
FROM
    stdlib_github.repository r
FULL JOIN 
    stdlib_github.tag t ON r.id = t.repository_id
FULL JOIN 
    stdlib_github.npm_publish n_p ON r.id = n_p.repository_id
FULL JOIN 
    stdlib_github.npm_rolling_version_download_count n_r_v_d_c ON r.id = n_r_v_d_c.repository_id
FULL JOIN 
    stdlib_github.workflow_run w_r ON r.id = w_r.repository_id
FULL JOIN 
    stdlib_github.workflow_job w_j ON r.id = w_j.repository_id
WHERE
    r.name ILIKE '%${search_term}%' OR
    t.tag ILIKE '%${search_term}%' OR
    w_r.status ILIKE '%${search_term}%'
`

Why this project?

This project is being developed for the stdlib developers to see the build status for all repositories at once. The skills required for this project aligns with my skills and passion, therefore I am very much excited to be a part of the team and contribute to its development.I have always wanted to contribute to open source projects and since this is a project which will benefit the entire JavaScript community, my motivation level has increased. This project has already taught me postgresql and how to use it with node js in less than a week. I am still learning more advanced concepts of postgresql to optimize my solution. JavaScript has extensive usage in web applications, server-side environments, and beyond. Yet, it has historically lacked a robust standard library for data science comparable to Python or R. stdlib aims to fill this gap by offering a collection of a wide range of functionalities, from mathematical operations and data structures to utilities for scientific computing, data visualization, machine learning, and beyond. The way this project is going to aid developers and maintainers in addressing issues with their projects, and thereby saving their precious time and energy, will be monumental. Moreover, as this project has more than 4k rows to retrieve and apply queries on it and display all on the screen using the concept of infinite scrolling, it is going to serve as a good platform for me to enhance my knowledge and skills.

This project provides me an opportunity to learn from mentors and experienced contributors. It will also make me more familiar with the vast open source community. I see myself as a better full stack developer after this project.

Qualifications

I possess the relevant technical skills and experience that align perfectly with the project's requirements. As the project utilizes React.js, Node.js and PostgreSQL database, which are technologies I already know, so this project has a low learning curve for me therefore more time can be utilized in developing the project and making it better than learning technology stacks. Since I have been developing websites in React.js and Node.js for the last 2 years, my learning and experience will help in completing this project on time and with minimal error. I have also learned PostgreSQL and how to connect it with node.js using the ‘pg’ library. I recently built a demo-dashboard project for this utiling PostgreSQL, Node.js and React.js, TailwindCSS. I believe that I can complete this project within the given timeline.

Prior art

I review the code of the npm statusboard that is shared with the issue, where they are tracking npm project status on a daily basis and displaying the operational status. I reviewed the code on cloning it’s repo. I found that in the frontend they are using html, css, bootstrap and js. and in the backend they are using Node.js for API.

I have read few blogs on google and watched some videos on youtube because the requirements are higher than npm statusboard. I also read node-postgres docs and also gathered information from AI chatbots to understand concepts quickly. I read blogs [ link1, link2 ] to understand how to display long lists in reactjs. I learned that I should use react-virtualiIzed for this.

Commitment

I would be able to devote approx 40-50 hours every week to GSOC. During mid May I will have my end-session examinations, thus during this period I will be giving 2-3 hours per day. I will try to complete the project before the GSOC period ends. I will be devoting my full time to this project and I do not have any other Summer internship or Job. I have no obligations after May, and would be devoting all my time to GSOC.

I would strive to be regular, and sincere with my scrum and daily updates as I understand that selection in this project will require a serious commitment and 100% devotion from my side.

I would also love to work on its version 2.0 because I know it will take my potential to another level. My involvement with this organization and the stdlib-js project goes beyond just a summer program.

Schedule

Assuming a 12 week schedule, Note: This can be revised after mentor suggestion

Community Bonding Period:

Week 1 & 2:

Week 3 & 4:

Week 5 & 6: (midterm)

Week 7, 8 & 9:

Week 10 & 11:

Week 12 :

Final Week:

Related issues

No response

Checklist

Planeshifter commented 3 months ago

Thanks for your proposal, @naveen1m. The proposal clearly shows that you spent a decent amount of time researching the project and are learning relevant technologies. Discussing some technical challenges such as implementing features like infinite scrolling or sorting is good.

I also like some of the feature suggestions such as Alert and Notifications for build failures, as they are highly relevant.

For some of the proposed technologies such as axios, it would be good to highlight their benefits over say using the native fetch web API to justify heir usage. Overall, it's a solid set of suggested technologies that should be well suited for the task at hand. You may want to check out TanStack Table, which has become the de-facto choice for rendering complex, interactive tables in React.

Given the scope of the project, it may be useful to indicate priorities for the features and discuss the potential challenges or risks and how they would be addressed.

P.S. There is already an existing (currently private) Fastify backend, and access to it would be provided should this project be selected for GSoC.

kgryte commented 3 months ago

@naveen1m Thanks for sharing a draft of your proposal. A few comments:

naveen1m commented 3 months ago

Given the scope of the project, it may be useful to indicate priorities for the features and discuss the potential challenges or risks and how they would be addressed.

@Planeshifter Thanks for your suggestion! We can use TanStack Table here as it is lighter and more optimized than react-virtualized, which i had proposed earlier to use. I read this blog to know the difference between them. Further I will learn deeper through TanStack Table documentation and will implement accordingly.

naveen1m commented 3 months ago

@naveen1m Thanks for sharing a draft of your proposal. A few comments:

  • For sorting, you provide a Postgres query for returning sorted results. I am somewhat leery about this, as it seems to me that sorting could be done server-side. Our hosted server is somewhat compute-constrained, so, if possible, I'd prefer if we can offload as much as possible to the client.
  • While 4K rows is expensive from a UI perspective (i.e., many DOM nodes), 4K rows of JSON is not. So, I think it should be fine to simply ship a single JSON blob to the client, but only display a subset of rows at a time to minimize DOM nodes.
  • My personal preference is using fastify for the backend server, rather than expressjs, as that is what we are already familiar with.
  • You mention the metrics page in weeks 7-9. Would you be able to comment a bit more here? What are you thinking you'd show in this page? How would navigation from the table to drill down work?
  • How would search work? In particular, what types of queries what search support? Is it just filtering the list of packages? Or would we be able to support more complex queries (e.g., "packages whose builds have failed over the last week")?

@kgryte thanks for review! To address above mentioned issue I think the the possible soultion could be :

naveen1m commented 3 months ago

@kgryte @Planeshifter Thanks for your comment! I finally submitted my proposal with suggested changes.