[SPIKE] JSON and Github as our Data Store

pandevim commented 4 years ago

Following up from #21. Team: @esu2020 @pandevim @justonsky

After investigation possible solutions for Spreadsheet based solutions in #26 and further discussion with @Zenahr. I spike JSON file for storing data on same Github repos as the SoS website. This solution solves limitations produced by other methods and emphasis an automated approach to build website using CI/CD. Things should be considerd while creating these pipeline configurations files are:

Every PR title should match the commit spec.
Every PR should only edit the ~data.js~ data.json file.
These commits from teams should only include ✔️ insertions and not ❌ deletions.
Head branch of PR should be deleted after merge.
Each merge should trigger build and deploy of the website. Can be Github pages or Netlify, etc.

Potential approaches to setup this method within Github:

Create or use existing Github bots. e.g. protobot
Use existing 3rd party Github Intergrations. e.g. Mergify
Create or use existing Github Actions workflow.

A normal workflow would look something like:

Creating new branch by name same as the team. i.e. team_name
Update the ~data.js~ data.json file with your team details
Make a PR on SoS Showcase website

Pros

One time setup.
Remove manual entry of data. The team would themselves make a PR including their team details.
Build, Test and Deploy at the same platform using minimal config files.

Cons

Expose our data publicly. This can create problems with exposing people's emails leading to spams, etc.

I created a demo project using Mergify: https://github.com/pandevim/719234cbd04995b37e784861dd08daf96ad9cfa40c877f9d519f5a6be3d34364/ which is deployed on Github pages: https://pandevim.github.io/719234cbd04995b37e784861dd08daf96ad9cfa40c877f9d519f5a6be3d34364/. You can test it by creating a PR on this repo.

Zenahr commented 4 years ago

Perhaps we should also scan the newly added entries for sensitive data when a PR comes flying in.

Regex for e-mail adresses for example.

But perhaps it'd be overkill for now and a short and simple manual check by the repo-owners is acceptable.

justonsky commented 4 years ago

This is awesome! And for many this may also introduce them to making their first pull requests as well, I'd imagine. One issue I can see though is that editing the data.js file may be a tedious process, and there may be mistakes in how someone submits their formatted JSON. To the latter point, maybe there could be a bot involved in the process to validate their input? Or we could even use the brand new Lint Action by Github.

Alternatively, maybe a simple front-end tool/form could be filled out that would generate that JSON, to be copy/pasted into the file and then PR'd?

pandevim commented 4 years ago

@Zenahr yes, by checking emails we can confirm it's from a SoS team only. We can put that to our potential features list. And for your second point I update the demo project pipeline configurations so now it would required one person approval review approved-reviews-by>=1 then it would be automatically merged to the master! You can see it working in: pandevim/SoS/pull/8.

@justonsky Definitely it would be data.json in the main project. I created data.js for my convenience only. Also how about a separate files like data/project.json and data/person.json and in person.json we can give each Person Object a unique User Id (UID) so that instead of writing the whole Object we just have to mention their u_id in the project.json file.

   {
      "name":"SoS Project 6",
      "categories":[
         "AI",
      ],
      "techUsed":[
         "React"
      ],
      "openToContributors":false,
      "logo":"https://thisisanexample.com/logo.png",
      "status":"shipped",
      "statusLastUpdated":"2020-06-09 (whatever timestamp)",
      "shortDesc":"This is the project that powers this website",
      "longDesc":"SoS is a project that blah blah blah .... more stuff, blah",
      "projectURL":"https://github.com/phil-ociraptor/sos-landing",
      "pointPerson": "77fe6eda-b440-11ea-b3de-0242ac130004",
      "mentors": [
          "55b70062-b440-11ea-b3de-0242ac130004",
          "840d0722-b440-11ea-b3de-0242ac130004"
      ],
      "collaborators": [
          "88447b54-b440-11ea-b3de-0242ac130004",
          "8d44ff7a-b440-11ea-b3de-0242ac130004",
          "91be1168-b440-11ea-b3de-0242ac130004"
      ]
   }

Having a linter built in our pull_request pipeline would be a good idea. We can use regex to enforce the correct json or for the time being a person can just review that changes. Maybe we can try this approach first and if still problem arises we can try the form thing.

Zenahr commented 4 years ago

This is awesome! And for many this may also introduce them to making their first pull requests as well, I'd imagine. One issue I can see though is that editing the data.js file may be a tedious process, and there may be mistakes in how someone submits their formatted JSON. To the latter point, maybe there could be a bot involved in the process to validate their input? Or we could even use the brand new Lint Action by Github.

Alternatively, maybe a simple front-end tool/form could be filled out that would generate that JSON, to be copy/pasted into the file and then PR'd?

We should provide a template.

Btw, we shouldn't forget that we could also use yaml instead of json which yields cleaner syntax.

justonsky commented 4 years ago

Definitely it would be data.json in the main project. I created data.js for my convenience only. Also how about a separate files like data/project.json and data/person.json and in person.json we can give each Person Object a unique User Id (UID) so that instead of writing the whole Object we just have to mention their u_id in the project.json file.

I like that idea, and I think the general JSON schema discussed in #8 could be updated to include that. Seems much more efficient and not as much data would have to be returned -- not to mention coincidentally Airtable includes the IDs of rows returned from its API (~~though I haven't modified the Netlify backend created to include that yet~~ nevermind it's already there), so this idea could be provided using whichever solution we settle on in the end.

phil-ociraptor commented 4 years ago

I like this, however, there are some worries for the UX:

For those who know how to submit a PR, this approach is cumbersome. For those who don't, it's intimidating. Could be good to learn, but I wouldn't want them to learn a PR workflow just to effectively submit a form. The increased friction might mean lots of people don't do it, and we might have to add everything ourselves.
Getting the mentor/collaborator ID is going to be very difficult for people.

Pros:

This approach is very simple for us

Cons:

Pushes complexity onto our users. To the point that they might not want to do it. Getting people to add themselves to a Google Sheet is already difficult

phil-ociraptor / sos-landing

[SPIKE] JSON and Github as our Data Store #31

Pros

Cons