Better Issue Tracking - Githubissues

scottgonzalez commented 9 years ago

@agcolom requested that I open a single issue with all requests.

[ ] Make the tracking code public
[ ] Monitor all repositories
[ ] Track pull requests and issues
[ ] Track opens and closes, not just total open count
[ ] Track average age
[ ] Push the data into Splunk

agcolom commented 9 years ago

I'll add a few requirements here as well... I'll edit this comment block as I get new ideas:

[ ] Track age range (as well as average age)
[ ] Track average age at which issues get closed
[ ] Track average age at which PRs get closed
[ ] Track % PRs that get closed as opposed to merged (if possible)
[ ] View tracking by team (i.e. all repos that fall under the chosen team)
[ ] For each repo, view who created PRs over time
[ ] For each repo, view who closed issues
[ ] For each repo, view who closed/landed PRs
[ ] For each repo, track issues that are assigned vs not assigned

arschmitz commented 9 years ago

There are a bunch of services out there already that do at least some of this already perhaps we should explore existing options before dumping tons of time into our own. I cant find some of the ones iv seen previously before but a couple i found from a very quick google search are

https://huboard.com/ https://waffle.io/ http://www.neat.io/bee/github-issues-client.html

Also a new service, ghtorrent designed around managing pull requests, that emailed me wanting to give us early access the other day and wanting feedback sent Me this http://ghtorrent.org/pullreq-perf/jquery-jquery-mobile/ http://ghtorrent.org/pullreq-perf/jquery-jquery-ui/ http://ghtorrent.org/prioritizer/#/display/jquery/jquery-mobile/7d6f850d55/

vidurangaw commented 9 years ago

Hi, I would like to contribute to this project via GSoC this year. Is there a mentor who can I discuss it with?

Regards, Viduranga

agcolom commented 9 years ago

@vpowerrc you can discuss this with me.

vidurangaw commented 9 years ago

To to this we need to use the Github API, which means we need to use a back-end language as well.

But In the ideas page the programming language for back-end wasn't mentioned. Can we use any language?

arthurvr commented 9 years ago

But In the ideas page the programming language for back-end wasn't mentioned. Can we use any language?

All our tooling is written in Node atm, I would prefer to stay doing so.

arschmitz commented 9 years ago

+1 for node. Other then a very small bit of php mobile demos, core tests, and for our websites ( we use wordpress ) we use nothing but node/js I would like to see us stick with node. We also actually have an issue on core to remove the php dependency and switch to node also https://github.com/jquery/jquery/issues/1999 . JavaScript is what all of our contributors are familiar with and I think would prefer to have to maintain.

vidurangaw commented 9 years ago

Cool! I looked to Github API. They also supports webhooks. Is this application going to depend on api calls only or should we maintain a database to store data from Github as well? I'm still trying to get my mind around the basic architecture of the application. Sorry for any inconvenience caused.

agcolom commented 9 years ago

I'm thinking that it would be a good idea to store the data as well. I'd like views from @scottgonzalez @gnarf @arschmitz regarding this.

scottgonzalez commented 9 years ago

Using webhooks provides us with realtime data, but also makes it harder to recover from any downtime. If you don't know what you missed, there's no easy way to just grab the missing data.

Using API calls, we can get all the changes since the last time we fetched. As for how often we should fetch data, that I'm not sure about since we don't yet know how many requests that will result in.

scottgonzalez commented 9 years ago

It's also worth mentioning other tools.

Metrics Grimoire is open source and powers the data portion of Bitergia's dashboards (see http://demo.bitergia.com/browser/). However, it's GPL and written in python, so we may not want to even look into it.

jQuery Foundation uses Splunk to store tons of data, but we don't have any of that open to the public right now. While I originally pushed for this data to just go into Splunk since that would provide all the graphing and analysis for us, it would also be nice to have a standalone product that others can use.

vidurangaw commented 9 years ago

@scottgonzalez Yes it depends on the requirements of the application. If the app doesn't have to updated real time, the best thing is to use API calls. Then we can use a cron job to fetch data and update the database according to a given interval.

@agcolom I would like to work on designing the database structure. But there are some requirements I don't quite understand. Can you guys help me with them? Here are them: Make the tracking code public Monitor all repositories - What do you mean by monitoring. What are the specifics? Track age range (as well as average age) View tracking by team (i.e. all repos that fall under the chosen team) - Teams should be assigned by the application itself, right? since there there is no such thing as team in Github. *Push the data into Splunk - @scottgonzalez said that he already push them to Splunk. Should I improve it?

Initial DB structure: teams repositories users pull requests *issues

Please add your comments.

scottgonzalez commented 9 years ago

*Make the tracking code public

It must live in a public repository. The current tracking code only exists on a private server.

*Monitor all repositories - What do you mean by monitoring. What are the specifics?

Monitoring = tracking. It's everything that this task is about. I'm not sure what else to say here, it should track every public repository in the jquery org.

*Track age range (as well as average age)

Track how long issues and PRs have been open.

*View tracking by team (i.e. all repos that fall under the chosen team) - Teams should be assigned by the application itself, right? since there there is no such thing as team in Github.

Well, there are teams in GitHub, but I'm not sure what @agcolom had in mind.

*Push the data into Splunk - @scottgonzalez said that he already push them to Splunk. Should I improve it?

When did I say that? If that were done, this project would be half-done.

scottgonzalez commented 9 years ago

Initial DB structure: teams repositories users pull requests *issues

It's hard to comment on just a list of table names.

arschmitz commented 9 years ago

For a database for this since we will probably pull JSON data it might make sense to use a JSON document based database like couchbase or couchdb. I actually talked to someone from couchbase earlier this week about something unrelated, and they seemed interested in working with us.

scottgonzalez commented 9 years ago

I'm not sure how the source data being JSON has anything to do with the data format for the database. The type of database we use should be based on the data we're storing and how we're using that data.

vidurangaw commented 9 years ago

@scottgonzalez Sorry, I have misunderstood your statement regarding pushing data into Spluck. I haven't use Spluck before. I hope that won't be problem to pursue this project. Am I just to send the data grabbed by Github via API calls or do I need to do some processing before sending them to Splunk?

It's hard to comment on just a list of table names.

I'm not talking about the relationships between tables. But it looks like, these are the basic tables we are going to need to store the data. Am I wrong?

scottgonzalez commented 9 years ago

Am I just to send the data grabbed by Github via API calls or do I need to do some processing before sending them to Splunk?

You would have to process the data into a format that Splunk can understand. However, as I said, we may not use Splunk. This is something that needs to be determined.

I'm not talking about the relationships between tables. But it looks like, these are the basic tables we are going to need to store the data. Am I wrong?

Again, without any details that's not a question that can really be answered.

vidurangaw commented 9 years ago

Again, without any details that's not a question that can really be answered.

Would an ER diagram suffice?

scottgonzalez commented 9 years ago

Would an ER diagram suffice?

Sure, or just actually providing details about the tables.

sudheesh001 commented 9 years ago

@scottgonzalez @agcolom

I am interested in this project too and would like to work on it. Could you explain a little more about what you mean by Making the tracking code public ?

Monitor all repositories
Track pull requests and issues
Track opens and closes and not just an open total count

I think using a cron would be a good way to keep track of these, probably scan the jquery/* github repos every 2 minutes for finding out changes if any ? In case the state of a particular issue / PR has changed, a hook which triggers the corresponding operations. At Mozilla, There's autolander, it checks for PR's or changes in PRs and triggers the required CI services or the bugzilla services accordingly. It'd be nice to have a look at it since it's built in JavaScript too.

On the application database, we can manage the other content like tracking the age ranges, its averages, the Percentage of the PR's can directly be obtained from the Open and Closed statuses of the PR's per repository.

Waffle.io seems like a very good existing option to learn from to customize the visualisations and tracking dashboard as required by jquery. I believe that using an ORM like Sequelize would simplify the work and not cause much of an issue with changing database schemas and in its migrations, rather than having to migrate the new schema from the SQL file every time an unexpected change occurs to it. Going with a MySQL database with Sequelize as the ORM would be a great step towards building this application.

When it comes to the front end, is there a necessity to use a front end framework like AngularJS ? or Can it be done in plain HTML/JS/CSS

As an addon, it'd be great to have analytics of contributors, their lines of codes, commits etc.., and gamify onto a leaderboard. There could also be an option for showing stats like

What time of the day do contributors generally commit their code.
How many timezones & countries/places are contributors from.

A few things that should be considered is the API Query limit that'd be allowed. Would it be better if we used the Github Archive instead ? They have a Big Query System too, maybe that'd be of help too.

This seems like a really exciting project and I would love to contribute to this.

vidurangaw commented 9 years ago

@scottgonzalez Here's my initial db design.

Image of Yaktocat

khansrk commented 9 years ago

Hi, I would like to contribute to jQuery Learning Center project via GSoC .Is there a mentor who can help me to start the work? shahrukh

arthurvr commented 9 years ago

@agcolom is mentor, @srk12345. We all hang out in Freenode IRC, #jquery-content. Also be sure to read https://github.com/jquery/gsoc/wiki/Getting-started-for-students.

khansrk commented 9 years ago

HI @agcolom i would like to contribute to jQuery Learning project. Plz tell me how can i help u in this project.

khansrk commented 9 years ago

Hi @agcolom Plz help me regarding jQuery learning center project.

jacquerie commented 9 years ago

Hey everyone. I wrote a small CLI tool in Node that satisfies some of the requirements in the first two comments.

Here's an example of its output:

{
  "repo": "jquery/jquery",
  "total": 2124,
  "openPulls": 11,
  "openPullsAge": {
    "min": 129755,
    "avg": 4160999,
    "max": 12516655
  },
  "openIssues": 99,
  "openIssuesAge": {
    "min": 9629,
    "avg": 7970797,
    "max": 13243015
  },
  "closedPulls": 1773,
  "closedPullsAge": {
    "min": 6,
    "avg": 1408170,
    "max": 52149399
  },
  "closedIssues": 241,
  "closedIssuesAge": {
    "min": 6,
    "avg": 1277130,
    "max": 12439871
  }
}

where the age of a PR or issue is measured in seconds.

If I understood this task correctly, the desired outcome of this project is some kind of admin tool (dashboard) that allows for easy outlier detection and progress reporting. In a sense, this tool should measure the "health" of a repository: a project has low health if PRs tend to go stale, issues stay open for long, nothing gets done...

For this reason, I don't think the commercial solutions (Bee, Waffle, and HuBoard) are right for this job, since they mostly deal with assigning/creating issues. Something that comes closer is GHTorrent's pullreq-perf.

Several components are needed to build such a tool.

Something that crawls GitHub's API and outputs some simplified form: I'm currently rolling my own using psunkara/octonode; @scottgonzalez told me on IRC about scottgonzalez/github-request and scottgonzalez/github-export.
Something that schedules the execution of this crawler, or avoids doing it if determines there's no new data to fetch: this looks doable using one of these, although I have no experience with any of them.
Some persistence layer (CouchDB seems widely favored) and some adapter mechanism to store this data in other backends (Splunk, which I'm not familiar with).
Finally, the actual dashboard, an Express app built using some kind of charting/drawing library.

An advantage of this design is that the fourth bullet point could be swapped with something else; something that comes to mind is a tool like GHTorrent's PRioritizer, or a tool which can be used by first time contributors to find good/needed/easy work to do.

I'd love to hear feedback on my plan.

jacquerie commented 9 years ago

And here's the code I was talking about: https://github.com/jacquerie/github-issues.

BTW, @agcolom, it's definitely possible to know which PRs were merged using this endpoint: https://developer.github.com/v3/pulls/#get-if-a-pull-request-has-been-merged. On the other hand, this probably fails when the maintaner chooses to cherry-pick a commit and close the PR. Is this acceptable?

agcolom commented 9 years ago

@jacquerie That sounds like a good plan to me. I don't think cherry-picking a comit and closing a PR is something that happens on a regular basis. Thanks also for the link on the Prioritizer. I was not aware of this tool.

scottgonzalez commented 9 years ago

On the other hand, this probably fails when the maintaner chooses to cherry-pick a commit and close the PR.

That describes about 100% of our PRs.

scottgonzalez commented 9 years ago

I don't think cherry-picking a comit and closing a PR is something that happens on a regular basis.

@agcolom That's our standard practice. We always rebase, edit the commit messages to includes references, then merge. That's effectively a cherry-pick. See http://contribute.jquery.org/repo-maintainers-guide/.

agcolom commented 9 years ago

ok, sorry, I was thinking old fixes. so that means this solution would not work then?

scottgonzalez commented 9 years ago

It just means that we can't track merges.

jacquerie commented 9 years ago

@agcolom only partially. Take for instance these two PRs in jQuery UI:

As you can see, the first one was merged in such a way that GitHub API is aware of it, while the other one was not. In fact, now we have:

$ curl -I https://api.github.com/repos/jquery/jquery-ui/pulls/1469/merge
HTTP/1.1 204 No Content
[...]

but

$ curl -I https://api.github.com/repos/jquery/jquery-ui/pulls/1466/merge
HTTP/1.1 404 Not Found
[...]

It should be said, though, that most PRs under the jQuery organization tend to fall in the second case.

scottgonzalez commented 9 years ago

jquery/jquery-ui#1469 is an anomaly; it was merged without a reference.

jacquerie commented 9 years ago

Yes, rather than "partially" the answer is "almost never".

Another strategy is to use the Issue Events: https://developer.github.com/v3/issues/events/#list-events-for-an-issue. We can listen for a closed event where the commit_id field is not null, as in https://api.github.com/repos/jquery/jquery-ui/issues/1466/events.

This is still not perfect: the closing commit could be completely different than what was originally proposed in the PR, but it looks like it should work in practice.

scottgonzalez commented 9 years ago

That can tell us who closed the issue/PR, but not if a PR was merged. For example, https://api.github.com/repos/jquery/jquery-ui/issues/1287/events will show that the PR was closed via https://github.com/jquery/jquery-ui/commit/4b017b414f107ed3c1dafc7601b61cbcd76acf61 but the PR was not merged, it was simply closed as a different solution was merged.

PhilippGrulich commented 9 years ago

Hey guys,

I'm also very interested in joining this project. I'm currently attending the Hamburg University of Applied Sciences, Germany and pursuing a BS degree in Computer Sciences. I have read your discussion and your ideas and I think this could be a really interesting web app. In my oppinion angularjs as a frontend framework would be a good choice. Is this an option? I think the node part could be very slim and only crawls github, stores the data to a DB and delivers the data to the clients. So everywhere is just simple JSON. As the Database I also think, it would be nice to have a document orientated DB like MongoDB or CouchDB, but at the moment i have the most experiences with MongoDB.

I have created a small prototype, just to test the github api, which is very basic and just shows the repositories and some information about the issues and so on. It also don't have an database yet.

Demo : http://jquery.philipp-grulich.de/app/#/ Github: https://github.com/PhilippGrulich/issue-tracking-tool

Greez Philipp

agcolom commented 9 years ago

For all those interested in submitting a proposal, please do not forget to submit your proposal at https://www.google-melange.com/gsoc/homepage/google/gsoc2015

g31pranjal commented 9 years ago

I am interested in contributing to this project through Gsoc.

Analyzing the problem, I feel that it can be clearly broken into 3 disjoint segments :

Scrapping data from GitHub and pushing into DB.
Manipulating and Analyzing the raw data to get clear insights.
Presenting the data in user-friendly format.

Now, I have few ideas for each of these sets :

Scrapping data : This can be done by means of Scheduled tasking with Cron on a Google App Engine or local server (What should be preferred?). The purpose is to flush raw new data from GitHub into the database. The event to flush data will be triggered only if there is some new data in that transaction.
Manipulation and Analyzing : I am planning to set this as separate segment, because :
1. This can be triggered as an event only when it is required(when we have new data to evaluate !) rather than at all requests.
2. More of the functionalities can be added in future without disturbing the other 2 parts. The purpose is to get useful insights from raw data like : no of open issues, closed issues, avg. age, age range and data for graphical display on Dashboard.
  1. Presentation on Dashboard : A simple angular app or rather just bootstrapped HTML/CSS to display the data.

@scottgonzalez , @agcolom , @arschmitz , please help me with these doubts regarding the flow :

Is it fine to do this on Google App Engine ? or should I prefer on local node/js server ?
Will it be fine if I use local DB ?
What is the extent of information to be shown in pull request ?
Will there be some kind of visibility rules based on the teams of each repository ?
Can I research with more measures to provide on Dashboard page ?

scottgonzalez commented 9 years ago

Is it fine to do this on Google App Engine ? or should I prefer on local node/js server ?

Local

Will it be fine if I use local DB ?

Local

What is the extent of information to be shown in pull request ?

I'm not sure what you're asking.

Will there be some kind of visibility rules based on the teams of each repository ?

No. Everything is public.

Can I research with more measures to provide on Dashboard page ?

I'm not sure what you're asking.

vidurangaw commented 9 years ago

@scottgonzalez Did you see my database design above? Do you have any comments?

g31pranjal commented 9 years ago

i was planning to track issues on github using crontab. The only problem i face is that theres no option for real time tracking as and when the issue is created. Is there a need of real-time updating in this project?

also what should be the first priority : quick update to data or deep Analysis of issues ??

@scottgonzalez @agcolom insights please..

scottgonzalez commented 9 years ago

Real-time tracking is not important.

scottgonzalez commented 9 years ago

Did you see my database design above? Do you have any comments?

I'm not sure what commitments are. I'm not sure what data you're tracking. I'm not sure how you're analyzing the data. There's a large chunk of missing information.

gauravparmar commented 9 years ago

Respected mentors, I am full of enthusiasm to work on it as my project for GSoC 2015 as this project sounds really interesting and suits my interest.

Being a student who is pursuing M.Tech. in Information Technology, I have also been working as a freelance Full Stack Developer and Graphic Designer since last few years and have knowledge about jQuery, JavaScript, Git/Github, HTML/HTML5, CSS/CSS3, Bootstrap, PHP, MySQL, AJAX, Python, Adobe Photoshop etc. I have interest in Open Source technology and Open Source Community.

You can also check my Linkedin profile at https://www.linkedin.com/in/gauravbparmar

I have practical experience of working on many projects that include device responsive website designing, web development, graphic designing etc.

After understanding this project I have chopped down this project into these major segments according to my understanding-

Continuous fetching of data from Github corresponding to jQuery's repositories through some way.
Storage of fetched data into a database for future analysis and information retrieval.
Scrutinizing data that is stored in the database and separation of required information related to "issues" from it and further separation of this "issues" related information according to teams that manage different official repositories belonging to jQuery on Github.
Manipulation of the separated information and generation of statistics through it.
Representation of live statistics on the webpage in a user friendly way. Something like a dashboard.

I would like to use this package for designing a device responsive web application that will serve as Tracking System for issues related to jQuery's Github repositories-

jQuery + AJAX + HTML5 + CSS3 + Bootstrap

For graphic designing-

Adobe Photoshop

For back-end I would like to use below package-

PHP + MySQL + Github API, Webhooks, Bicho Tool (https://github.com/MetricsGrimoire/Bicho) or any other tool that does data fetching work for repositories on Github.

I am completely ready for all the website design and development part except one thing where I am getting stuck, that is the way to grab issues related data from Github for jQuery repositories.

As @scottgonzalez has said above that there are some problems using Webhooks so I have searched further and got Bicho that is one of the tools in the MetricsGrimoire toolset (Also mentioned by @scottgonzalez above).

@agcolom, @scottgonzalez and other mentors please tell -

your views about Bicho Tool.
is it okay if I use above technology stack?
if we use API calls then how often shall we have to fetch the data? What should be the time interval to do this process?
do you know about any other tool that can aid in this process?
will this tracking system have all the features that are currently there in huboard (https://huboard.com)?

I think under your guidance and team work, we can surely turn this project idea into reality.

Here are some of my past projects to build your trust on me - http://gauravparmar.byethost7.com/myplace http://youthfest.in http://www.globalharmonyfundraising.org/ http://gauravparmar.byethost7.com/amaroo/ http://www.priceshaved.com http://gauravparmar.byethost7.com/project3_ui/

Regards Gaurav Parmar

scottgonzalez commented 9 years ago

your views about Bicho Tool.

Well, Bicho is written in python. We aim for node and PHP, with a very strong preference toward node.

is it okay if I use above technology stack?

We'd prefer node over PHP. I don't think we'll use webhooks.

For Bootstrap, I assume you're referring only to the CSS portion?

if we use API calls then how often shall we have to fetch the data? What should be the time interval to do this process?

That's not an important decision up front. We'll figure that out during development.

do you know about any other tool that can aid in this process?

https://www.npmjs.com/package/github-request https://github.com/scottgonzalez/github-export (WIP)

will this tracking system have all the features that are currently there in huboard (https://huboard.com)?

Extremely different. This is in no way a bug tracker/scrum board/kanban board. It's a read-only analysis on issues and PRs.

gauravparmar commented 9 years ago

@scottgonzalez : As https://github.com/scottgonzalez/github-export is in progress so do you think https://www.npmjs.com/package/github-request will satisfy all our needs on that matter or we shall have to tweak it to suit our needs?

@arschmitz : I also found you as a mentor for this project on the ideas list page of jQuery. What would you suggest for grabbing "issues" related data from Github so that it can be used in our web application which will make use of it to serve as issue tracking system?

gauravparmar commented 9 years ago

I would also like views on this matter from @agcolom.

gauravparmar commented 9 years ago

@scottgonzalez : Yes, we can work on Node.js as well instead of PHP. Actually it will be better. Great to see that you worked on https://www.npmjs.com/package/github-request

jquery / content

Better Issue Tracking #4