Open scottgonzalez opened 9 years ago
I'll add a few requirements here as well... I'll edit this comment block as I get new ideas:
There are a bunch of services out there already that do at least some of this already perhaps we should explore existing options before dumping tons of time into our own. I cant find some of the ones iv seen previously before but a couple i found from a very quick google search are
https://huboard.com/ https://waffle.io/ http://www.neat.io/bee/github-issues-client.html
Also a new service, ghtorrent designed around managing pull requests, that emailed me wanting to give us early access the other day and wanting feedback sent Me this http://ghtorrent.org/pullreq-perf/jquery-jquery-mobile/ http://ghtorrent.org/pullreq-perf/jquery-jquery-ui/ http://ghtorrent.org/prioritizer/#/display/jquery/jquery-mobile/7d6f850d55/
Hi, I would like to contribute to this project via GSoC this year. Is there a mentor who can I discuss it with?
Regards, Viduranga
@vpowerrc you can discuss this with me.
To to this we need to use the Github API, which means we need to use a back-end language as well.
But In the ideas page the programming language for back-end wasn't mentioned. Can we use any language?
But In the ideas page the programming language for back-end wasn't mentioned. Can we use any language?
All our tooling is written in Node atm, I would prefer to stay doing so.
+1 for node. Other then a very small bit of php mobile demos, core tests, and for our websites ( we use wordpress ) we use nothing but node/js I would like to see us stick with node. We also actually have an issue on core to remove the php dependency and switch to node also https://github.com/jquery/jquery/issues/1999 . JavaScript is what all of our contributors are familiar with and I think would prefer to have to maintain.
Cool! I looked to Github API. They also supports webhooks. Is this application going to depend on api calls only or should we maintain a database to store data from Github as well? I'm still trying to get my mind around the basic architecture of the application. Sorry for any inconvenience caused.
I'm thinking that it would be a good idea to store the data as well. I'd like views from @scottgonzalez @gnarf @arschmitz regarding this.
Using webhooks provides us with realtime data, but also makes it harder to recover from any downtime. If you don't know what you missed, there's no easy way to just grab the missing data.
Using API calls, we can get all the changes since the last time we fetched. As for how often we should fetch data, that I'm not sure about since we don't yet know how many requests that will result in.
It's also worth mentioning other tools.
Metrics Grimoire is open source and powers the data portion of Bitergia's dashboards (see http://demo.bitergia.com/browser/). However, it's GPL and written in python, so we may not want to even look into it.
jQuery Foundation uses Splunk to store tons of data, but we don't have any of that open to the public right now. While I originally pushed for this data to just go into Splunk since that would provide all the graphing and analysis for us, it would also be nice to have a standalone product that others can use.
@scottgonzalez Yes it depends on the requirements of the application. If the app doesn't have to updated real time, the best thing is to use API calls. Then we can use a cron job to fetch data and update the database according to a given interval.
@agcolom I would like to work on designing the database structure. But there are some requirements I don't quite understand. Can you guys help me with them? Here are them: Make the tracking code public Monitor all repositories - What do you mean by monitoring. What are the specifics? Track age range (as well as average age) View tracking by team (i.e. all repos that fall under the chosen team) - Teams should be assigned by the application itself, right? since there there is no such thing as team in Github. *Push the data into Splunk - @scottgonzalez said that he already push them to Splunk. Should I improve it?
Initial DB structure: teams repositories users pull requests *issues
Please add your comments.
*Make the tracking code public
It must live in a public repository. The current tracking code only exists on a private server.
*Monitor all repositories - What do you mean by monitoring. What are the specifics?
Monitoring = tracking. It's everything that this task is about. I'm not sure what else to say here, it should track every public repository in the jquery org.
*Track age range (as well as average age)
Track how long issues and PRs have been open.
*View tracking by team (i.e. all repos that fall under the chosen team) - Teams should be assigned by the application itself, right? since there there is no such thing as team in Github.
Well, there are teams in GitHub, but I'm not sure what @agcolom had in mind.
*Push the data into Splunk - @scottgonzalez said that he already push them to Splunk. Should I improve it?
When did I say that? If that were done, this project would be half-done.
Initial DB structure: teams repositories users pull requests *issues
It's hard to comment on just a list of table names.
For a database for this since we will probably pull JSON data it might make sense to use a JSON document based database like couchbase or couchdb. I actually talked to someone from couchbase earlier this week about something unrelated, and they seemed interested in working with us.
I'm not sure how the source data being JSON has anything to do with the data format for the database. The type of database we use should be based on the data we're storing and how we're using that data.
@scottgonzalez Sorry, I have misunderstood your statement regarding pushing data into Spluck. I haven't use Spluck before. I hope that won't be problem to pursue this project. Am I just to send the data grabbed by Github via API calls or do I need to do some processing before sending them to Splunk?
It's hard to comment on just a list of table names.
I'm not talking about the relationships between tables. But it looks like, these are the basic tables we are going to need to store the data. Am I wrong?
Am I just to send the data grabbed by Github via API calls or do I need to do some processing before sending them to Splunk?
You would have to process the data into a format that Splunk can understand. However, as I said, we may not use Splunk. This is something that needs to be determined.
I'm not talking about the relationships between tables. But it looks like, these are the basic tables we are going to need to store the data. Am I wrong?
Again, without any details that's not a question that can really be answered.
Again, without any details that's not a question that can really be answered.
Would an ER diagram suffice?
Would an ER diagram suffice?
Sure, or just actually providing details about the tables.
@scottgonzalez @agcolom
I am interested in this project too and would like to work on it. Could you explain a little more about what you mean by Making the tracking code public
?
Monitor all repositories
Track pull requests and issues
Track opens and closes and not just an open total count
I think using a cron would be a good way to keep track of these, probably scan the jquery/* github repos every 2 minutes for finding out changes if any ? In case the state of a particular issue / PR has changed, a hook which triggers the corresponding operations. At Mozilla, There's autolander, it checks for PR's or changes in PRs and triggers the required CI services or the bugzilla services accordingly. It'd be nice to have a look at it since it's built in JavaScript too.
On the application database, we can manage the other content like tracking the age ranges, its averages, the Percentage of the PR's can directly be obtained from the Open
and Closed
statuses of the PR's per repository.
Waffle.io seems like a very good existing option to learn from to customize the visualisations and tracking dashboard as required by jquery. I believe that using an ORM like Sequelize would simplify the work and not cause much of an issue with changing database schemas and in its migrations, rather than having to migrate the new schema from the SQL file every time an unexpected change occurs to it. Going with a MySQL database with Sequelize as the ORM would be a great step towards building this application.
When it comes to the front end, is there a necessity to use a front end framework like AngularJS ? or Can it be done in plain HTML/JS/CSS
As an addon, it'd be great to have analytics of contributors, their lines of codes, commits etc.., and gamify onto a leaderboard. There could also be an option for showing stats like
A few things that should be considered is the API Query limit that'd be allowed. Would it be better if we used the Github Archive instead ? They have a Big Query System too, maybe that'd be of help too.
This seems like a really exciting project and I would love to contribute to this.
@scottgonzalez Here's my initial db design.
Hi, I would like to contribute to jQuery Learning Center project via GSoC .Is there a mentor who can help me to start the work? shahrukh
@agcolom is mentor, @srk12345. We all hang out in Freenode IRC, #jquery-content. Also be sure to read https://github.com/jquery/gsoc/wiki/Getting-started-for-students.
HI @agcolom i would like to contribute to jQuery Learning project. Plz tell me how can i help u in this project.
Hi @agcolom Plz help me regarding jQuery learning center project.
Hey everyone. I wrote a small CLI tool in Node that satisfies some of the requirements in the first two comments.
Here's an example of its output:
{
"repo": "jquery/jquery",
"total": 2124,
"openPulls": 11,
"openPullsAge": {
"min": 129755,
"avg": 4160999,
"max": 12516655
},
"openIssues": 99,
"openIssuesAge": {
"min": 9629,
"avg": 7970797,
"max": 13243015
},
"closedPulls": 1773,
"closedPullsAge": {
"min": 6,
"avg": 1408170,
"max": 52149399
},
"closedIssues": 241,
"closedIssuesAge": {
"min": 6,
"avg": 1277130,
"max": 12439871
}
}
where the age of a PR or issue is measured in seconds.
If I understood this task correctly, the desired outcome of this project is some kind of admin tool (dashboard) that allows for easy outlier detection and progress reporting. In a sense, this tool should measure the "health" of a repository: a project has low health if PRs tend to go stale, issues stay open for long, nothing gets done...
For this reason, I don't think the commercial solutions (Bee, Waffle, and HuBoard) are right for this job, since they mostly deal with assigning/creating issues. Something that comes closer is GHTorrent's pullreq-perf.
Several components are needed to build such a tool.
An advantage of this design is that the fourth bullet point could be swapped with something else; something that comes to mind is a tool like GHTorrent's PRioritizer, or a tool which can be used by first time contributors to find good/needed/easy work to do.
I'd love to hear feedback on my plan.
And here's the code I was talking about: https://github.com/jacquerie/github-issues.
BTW, @agcolom, it's definitely possible to know which PRs were merged using this endpoint: https://developer.github.com/v3/pulls/#get-if-a-pull-request-has-been-merged. On the other hand, this probably fails when the maintaner chooses to cherry-pick a commit and close the PR. Is this acceptable?
@jacquerie That sounds like a good plan to me. I don't think cherry-picking a comit and closing a PR is something that happens on a regular basis. Thanks also for the link on the Prioritizer. I was not aware of this tool.
On the other hand, this probably fails when the maintaner chooses to cherry-pick a commit and close the PR.
That describes about 100% of our PRs.
I don't think cherry-picking a comit and closing a PR is something that happens on a regular basis.
@agcolom That's our standard practice. We always rebase, edit the commit messages to includes references, then merge. That's effectively a cherry-pick. See http://contribute.jquery.org/repo-maintainers-guide/.
ok, sorry, I was thinking old fixes. so that means this solution would not work then?
It just means that we can't track merges.
@agcolom only partially. Take for instance these two PRs in jQuery UI:
As you can see, the first one was merged in such a way that GitHub API is aware of it, while the other one was not. In fact, now we have:
$ curl -I https://api.github.com/repos/jquery/jquery-ui/pulls/1469/merge
HTTP/1.1 204 No Content
[...]
but
$ curl -I https://api.github.com/repos/jquery/jquery-ui/pulls/1466/merge
HTTP/1.1 404 Not Found
[...]
It should be said, though, that most PRs under the jQuery organization tend to fall in the second case.
jquery/jquery-ui#1469 is an anomaly; it was merged without a reference.
Yes, rather than "partially" the answer is "almost never".
Another strategy is to use the Issue Events: https://developer.github.com/v3/issues/events/#list-events-for-an-issue. We can listen for a closed
event where the commit_id
field is not null, as in https://api.github.com/repos/jquery/jquery-ui/issues/1466/events.
This is still not perfect: the closing commit could be completely different than what was originally proposed in the PR, but it looks like it should work in practice.
That can tell us who closed the issue/PR, but not if a PR was merged. For example, https://api.github.com/repos/jquery/jquery-ui/issues/1287/events will show that the PR was closed via https://github.com/jquery/jquery-ui/commit/4b017b414f107ed3c1dafc7601b61cbcd76acf61 but the PR was not merged, it was simply closed as a different solution was merged.
Hey guys,
I'm also very interested in joining this project. I'm currently attending the Hamburg University of Applied Sciences, Germany and pursuing a BS degree in Computer Sciences. I have read your discussion and your ideas and I think this could be a really interesting web app. In my oppinion angularjs as a frontend framework would be a good choice. Is this an option? I think the node part could be very slim and only crawls github, stores the data to a DB and delivers the data to the clients. So everywhere is just simple JSON. As the Database I also think, it would be nice to have a document orientated DB like MongoDB or CouchDB, but at the moment i have the most experiences with MongoDB.
I have created a small prototype, just to test the github api, which is very basic and just shows the repositories and some information about the issues and so on. It also don't have an database yet.
Demo : http://jquery.philipp-grulich.de/app/#/ Github: https://github.com/PhilippGrulich/issue-tracking-tool
Greez Philipp
For all those interested in submitting a proposal, please do not forget to submit your proposal at https://www.google-melange.com/gsoc/homepage/google/gsoc2015
I am interested in contributing to this project through Gsoc.
Analyzing the problem, I feel that it can be clearly broken into 3 disjoint segments :
Now, I have few ideas for each of these sets :
@scottgonzalez , @agcolom , @arschmitz , please help me with these doubts regarding the flow :
- Is it fine to do this on Google App Engine ? or should I prefer on local node/js server ?
Local
- Will it be fine if I use local DB ?
Local
- What is the extent of information to be shown in pull request ?
I'm not sure what you're asking.
- Will there be some kind of visibility rules based on the teams of each repository ?
No. Everything is public.
- Can I research with more measures to provide on Dashboard page ?
I'm not sure what you're asking.
@scottgonzalez Did you see my database design above? Do you have any comments?
i was planning to track issues on github using crontab. The only problem i face is that theres no option for real time tracking as and when the issue is created. Is there a need of real-time updating in this project?
also what should be the first priority : quick update to data or deep Analysis of issues ??
@scottgonzalez @agcolom insights please..
Real-time tracking is not important.
Did you see my database design above? Do you have any comments?
I'm not sure what commitments are. I'm not sure what data you're tracking. I'm not sure how you're analyzing the data. There's a large chunk of missing information.
Respected mentors, I am full of enthusiasm to work on it as my project for GSoC 2015 as this project sounds really interesting and suits my interest.
Being a student who is pursuing M.Tech. in Information Technology, I have also been working as a freelance Full Stack Developer and Graphic Designer since last few years and have knowledge about jQuery, JavaScript, Git/Github, HTML/HTML5, CSS/CSS3, Bootstrap, PHP, MySQL, AJAX, Python, Adobe Photoshop etc. I have interest in Open Source technology and Open Source Community.
You can also check my Linkedin profile at https://www.linkedin.com/in/gauravbparmar
I have practical experience of working on many projects that include device responsive website designing, web development, graphic designing etc.
After understanding this project I have chopped down this project into these major segments according to my understanding-
I would like to use this package for designing a device responsive web application that will serve as Tracking System for issues related to jQuery's Github repositories-
jQuery + AJAX + HTML5 + CSS3 + Bootstrap
For graphic designing-
Adobe Photoshop
For back-end I would like to use below package-
PHP + MySQL + Github API, Webhooks, Bicho Tool (https://github.com/MetricsGrimoire/Bicho) or any other tool that does data fetching work for repositories on Github.
I am completely ready for all the website design and development part except one thing where I am getting stuck, that is the way to grab issues related data from Github for jQuery repositories.
As @scottgonzalez has said above that there are some problems using Webhooks so I have searched further and got Bicho that is one of the tools in the MetricsGrimoire toolset (Also mentioned by @scottgonzalez above).
@agcolom, @scottgonzalez and other mentors please tell -
I think under your guidance and team work, we can surely turn this project idea into reality.
Here are some of my past projects to build your trust on me - http://gauravparmar.byethost7.com/myplace http://youthfest.in http://www.globalharmonyfundraising.org/ http://gauravparmar.byethost7.com/amaroo/ http://www.priceshaved.com http://gauravparmar.byethost7.com/project3_ui/
Regards Gaurav Parmar
your views about Bicho Tool.
Well, Bicho is written in python. We aim for node and PHP, with a very strong preference toward node.
is it okay if I use above technology stack?
We'd prefer node over PHP. I don't think we'll use webhooks.
For Bootstrap, I assume you're referring only to the CSS portion?
if we use API calls then how often shall we have to fetch the data? What should be the time interval to do this process?
That's not an important decision up front. We'll figure that out during development.
do you know about any other tool that can aid in this process?
https://www.npmjs.com/package/github-request https://github.com/scottgonzalez/github-export (WIP)
will this tracking system have all the features that are currently there in huboard (https://huboard.com)?
Extremely different. This is in no way a bug tracker/scrum board/kanban board. It's a read-only analysis on issues and PRs.
@scottgonzalez : As https://github.com/scottgonzalez/github-export is in progress so do you think https://www.npmjs.com/package/github-request will satisfy all our needs on that matter or we shall have to tweak it to suit our needs?
@arschmitz : I also found you as a mentor for this project on the ideas list page of jQuery. What would you suggest for grabbing "issues" related data from Github so that it can be used in our web application which will make use of it to serve as issue tracking system?
I would also like views on this matter from @agcolom.
@scottgonzalez : Yes, we can work on Node.js as well instead of PHP. Actually it will be better. Great to see that you worked on https://www.npmjs.com/package/github-request
@agcolom requested that I open a single issue with all requests.