reposense / RepoSense

Contribution analysis tool for Git repositories
https://reposense.org
MIT License
245 stars 154 forks source link

Walk users through creating and deploying their first report #1430

Open dcshzj opened 3 years ago

dcshzj commented 3 years ago

What feature(s) would you like to see in RepoSense? There should be a way to walk users through creating and deploying their first RepoSense report. This includes forking the publish-RepoSense repository automatically as well as creating the necessary changes to the configuration files to match the requirements of the user.

These requirements can be specified in an interface (such as a website) such that the user can just answer some questions and the configuration files can be automatically populated.

Is the feature request related to a problem? Currently, new users of RepoSense need to walk through the user guide and understand the application before being able to create their first RepoSense report. As the user guide is meant to be comprehensive, it is a significant learning curve as new users will be presented information that they are not concerned with for their first report.

This feature request aims to reduce the amount of learning required by the user, allowing for them to dive directly into making their first report.

If possible, describe the solution A static webpage is a feasible solution where the user can fill in a form and answer some of the questions. Additional configuration options can be hidden from view unless explicitly selected by the user (such as the start date, until date, time zone, etc).

Ideally, the webpage can authenticate as the user on GitHub and perform the actions seamlessly using the user's browser via API calls (forking the repository, uploading the configuration files, building and publishing the report, etc). When the form is submitted, a loading icon can appear for a while before the user is given the link to see their generated RepoSense report as well as the GitHub repository created.

The webpage should also be responsive to errors and detect if certain steps (such as forking the repository) has already been performed, such that steps are not unnecessarily duplicated, even across sessions.

Engineering Challenges

Engineering challenge 1: Authentication with GitHub We choose to authenticate with GitHub as we assume that the report will be published on GitHub Pages (there are other providers out there, but we chose GitHub Pages for simplicity in publishing). Hence, there is a need to interact with the GitHub API to authenticate the user, using the OAuth2 protocol.

Unfortunately, the authentication flow that GitHub uses does not support the implicit grant type. This means that there needs to be an underlying server that can handle requests coming from users that were redirected after the first step of the authentication is done. This underlying server needs to exchange the authorization code provided in the redirect request to GitHub in exchange for an access token.

The difficulty comes in whether we should use an underlying server to exchange the authorization code for an access token. Having such a server simplifies implementation, but the running costs may be rather high for very little benefit.

An alternative implementation is to rely on the user to obtain a personal access token themselves, and to provide that token to the setup wizard to execute the steps on behalf of the user. This would be an additional field in the setup wizard for the user to paste their personal access token. This also has the added benefit of keeping the setup wizard entirely client-only, which allows it to be hosted on static site hosts such as GitHub Pages for no extra costs.

Engineering challenge 2: Lack of a schema for configuration files RepoSense has always been configured using various CSV and JSON configuration files. However, it is not very user-friendly, especially for new users of RepoSense, to create these CSV files in order to generate the report (in addition to figuring out how to set up RepoSense itself). Hence, it is crucial that the setup wizard is able to abstract these information away from the user and handle the creation of these configuration files for them.

However, there is currently no pre-defined schema for these configuration files. This makes it difficult for the setup wizard to implement validation logic, as it is unclear exactly what is valid. It is trivial to put in some logic into the setup wizard, but the ideal implementation is to develop a proper schema that is stored in the RepoSense repository, and the setup wizard will just need to implement the functionality to validate the user's input against this schema.

Nothing has been done so far in this aspect, and some research may be needed to understand how a proper schema should be implemented.

Other considerations

  1. Handling exceptions such as having an existing repository, insufficient permissions to create repository in an organization, or any other errors coming from the GitHub API.
  2. Allowing power users to upload their configuration files directly into the setup wizard.
  3. Maybe to add support for modifying an existing repository's configuration file and regenerating the report.
gerhean commented 3 years ago

Started work on 1430_feature_branch

dcshzj commented 3 years ago

This looks like a good example that we can reference: https://cauldron.io/projects/new

vvidday commented 1 year ago

Regarding (1), one potential alternative is to use serverless functions to perform the exchange. These seem to have a much more generous free tier (e.g. Cloudflare workers and Deno deploy have free tiers that provide 100k requests/day, which should be more than sufficient for the wizard.)

I've setup a basic demo for the oAuth authentication with cloudflare workers:

I feel that requiring the user to setup a personal access token would worsen the UX, especially since they might make a mistake in configuring the required permissions of the token itself. Hence I think that we should try to avoid it if implementing Github authentication is feasible.

For (2), @MarcusTXK was exploring the possibility of a similar schema in the RepoSense repository, and also potentially using it to generate the form on the wizard. Let's discuss more about this here!

@dcshzj Regarding the admin/ownership part, since the wizard is a standalone app, should we create it as a separate repo instead of part of the monorepo like in 1430_feature_branch? If you approve of the above I'll proceed with transferring the ownership of the two repositories & the oAuth app to the RepoSense organization. Thanks!

vvidday commented 1 year ago

The stack of https://github.com/vvidday/RepoSense-wizard is as suggested on Slack - Vue3/Typescript. I've also tried to keep things similar to RepoSense's frontend, like using Pug as the templating engine and Cypress for e2e testing.

The differences are:

I've tried to break down the development into some tasks, @ckcherry23 @sikai00 @MarcusTXK let me know what you guys think? And feel free to add anything that I missed

Wizard

Auth

Implement functionality to:

Form

Others

DevOps

Misc

dcshzj commented 1 year ago

@vvidday Yup, I think we should do this in a separate repository. Please do move the repositories over to the reposense organization, I think you should have sufficient permissions to do so.

vvidday commented 1 year ago

Just transferred ownership of the two repos (wizard and worker)

I've started the transfer process of the oAuth app, but I think it needs to be approved from the other side image

dcshzj commented 1 year ago

@vvidday Thanks! I have just approved it.

MarcusTXK commented 1 year ago

For (2), @MarcusTXK was exploring the possibility of a similar schema in the RepoSense repository, and also potentially using it to generate the form on the wizard. Let's discuss more about this here!

A possible design I have in mind is for both the wizard and the backend to share a single source of truth for the arguments/configs. This way, the wizard can dynamically generate an up-to-date form from the relevant arguments that the backend accepts.

The benefit of doing it this way is that this can prevent the wizard from becoming outdated and prevent a developer from having to manually add a new field to the setup wizard each time a new argument is added in the backend.

The demerit is that refactoring has to take place on the backend to support this. Another caveat is that the backend is in RepoSense, while the wizard will be in a seperate repository, so we might have to potentially import RepoSense as a submodule in the wizard or some other alternative of sharing this file.

Alternatively, another approach we could is to just setup the wizard to be dynamically generated based on a config file (a separate file from the RepoSense) that way it can be easily updated and an up-to-date form can easily be generated each time a new flag/config is added.

The main reason I prefer a dynamically generated site bassed of config (be it for the first or second approach) is for easy maintainability over hardcoding the entire form in the front end.