mrnabiz / github_user_segmentation

A K-means clustering approach to segment GitHub users
https://github-user-segmentation.onrender.com/
MIT License
0 stars 0 forks source link

Teammate Feedback Issue #7

Open mrnabiz opened 1 year ago

mrnabiz commented 1 year ago

Please go crazy and rip off my project and of course, criticize me!

Mengjun74 commented 1 year ago

The link in ur description does not work, it shows 404.

mrnabiz commented 1 year ago

@Mengjun74 Thanks for your comment. Actually, the problem was from the K-mean model running on the backend. Please try again https://github-user-segmentation.onrender.com/

mrnabiz commented 1 year ago

After doing some data wrangling, I used the K-means clustering algorithm for user segmentation and the PCA method to reduce the dimensionality of the interaction record for visualization purposes with Plotly.

There are some distinct clusters visible in the data:

  1. 🤓 Puller-Watchers: A group of users who tend to pull after watching changes (probably junior developers)
  2. 🩺 Reviewers - Creators: Another segment of users who had a high frequency of reviewing PRs and commits while creating PRs (most likely QA engineers)
  3. 🚀 Pushers - Releasers: Another segment of users who were dominantly pushing changes and creating releases (most likely integration and deployment teams)

📊 Next, I plotted their behavior pattern with a Sankey visualization which is usually used to show a flow from one set of values to another. Sankeys are best used to show a many-to-many mapping with multiple paths through a set of stages. In a nutshell, a majority of users start their interaction by committing and pushing then flowing toward creating PRs and reviewing the other PRs.

WilfHass commented 1 year ago

Looks great! I have a few things to note:

  1. When changing the number of steps/number of events per step, it appears I'm only affecting the last graph but all of the graphs reload. Not the biggest deal, but could be cleaner to have only the plot it is affecting reload.
  2. Consider having tabs to show the different visualizations! Structuring it this way allows for multiple modifications in case you want to add more plots at a later date/maybe even an analysis tab. Also gives you more working room to add explanations to the plots. Fantastic plots and work Nabi, well done!
roanraina commented 1 year ago

I wonder what type of GitHub user I am! Very interesting project. As for suggestions for improvement:

  1. There is a lack of expiration on the app about what is going on. The plots themselves are very information dense and it would be beneficial to have some type of text available to aid the user in the interpretation of the plots.
  2. I would echo @WilfHass comment regarding separate tabs for plots. I think having separate pages for the plots would speed up loading time and prevent the user from feeling an information overload.
  3. Also - personally I think having the filters to the sides of the plots would look cleaner.