spgroup / groundhog

A framework for crawling GitHub projects and raw data and to extract metrics from them
http://spgroup.github.io/groundhog
GNU General Public License v2.0
15 stars 10 forks source link

Improve project metadata #14

Closed rodrigoalvesvieira closed 11 years ago

rodrigoalvesvieira commented 11 years ago

Currently, the Project class in Groundhog, the one that represents the projects lacks many metadata compositions that interest us in our research, for example: license, whether the project enable downloads, has issues, or wiki, or whether it is a fork or not, the number of forks the project has, its size in kilobytes and many more.

We want all this data in order to enrich our research and make the tool more precise and robust.

This issue is for improving the representation of projects in Groundhog by grabbing more useful metadata from these projects.

rodrigoalvesvieira commented 11 years ago

More parameters to be considered:

rodrigoalvesvieira commented 11 years ago

Some inspiration from Boa

boa-types

gustavopinto commented 11 years ago

Other parameters:

rodrigoalvesvieira commented 11 years ago

Sample GitHub API resource https://api.github.com/legacy/repos/search/groundhog Sample SourceForge resource http://sourceforge.net/directory/os:mac/freshness:recently-updated/?q=twitter

rodrigoalvesvieira commented 11 years ago

I'm working on this. I've deleted the previous ft-project-metadata because that branch was too far behind the current master and when merging I'd have been into huge trouble. So I created another branch and re-did the work.

Later today I'll push it.

rodrigoalvesvieira commented 11 years ago

These are the recuperable project's properties on github:

{
    "type": "repo",
    "username": "spgroup",
    "name": "groundhog",
    "owner": "spgroup",
    "homepage": null,
    "description": "Epona/Groundhog",
    "language": "Java",
    "watchers": 7,
    "followers": 7,
    "forks": 1,
    "size": 248,
    "open_issues": 9,
    "score": 14.250591,
    "has_downloads": true,
    "has_issues": true,
    "has_wiki": true,
    "fork": false,
    "private": false,
    "url": "https://github.com/spgroup/groundhog",
    "created": "2013-04-12T15:12:28Z",
    "created_at": "2013-04-12T15:12:28Z",
    "pushed_at": "2013-05-01T17:33:08Z",
    "pushed": "2013-05-01T17:33:08Z"
}

Later I'll search for Google Code and SourceForge on this.

rodrigoalvesvieira commented 11 years ago

Hi everyone,

I'd like some help here with the branch I'm currently working on - that is - ft-project-metadata. As you may know, I'm not a experienced Java dev and because of that fact I'm having trouble parsing dates. The problem is:

I need to store the last push date of the project. So in the Project class 1 I have:

public void setLastPushedAt(String lastPushedAtParam) throws ParseException, java.text.ParseException {     
    SimpleDateFormat format = new SimpleDateFormat("YYYY-MM-dd hh:mm:ss"); // first example
    Date lastPushDate = format.parse(lastPushedAtParam);

    this.lastPushedAt = lastPushDate;
}

and then in the GitHub search fetching program I have the following method call (summarized for better understanding, please read the full code 2 if you need more):


String lastPushedAt = result.getString("pushed_at");

try {
    forgeProject.setLastPushedAt(lastPushedAt);
} catch (java.text.ParseException e) {
    e.printStackTrace();
} catch (ParseException e) {
    e.printStackTrace();
}

// finally trying to display the stored lastPushedAt value:

System.out.println(forgeProject.getLastPushedAt(); // returns null;

So it returns null. And I don't know why this is happening. I'm completely lost on this so I appreciate any help. I tried to be very precise and objective describing this issue, if you need more info in order to help just ask me and I'll answer right away.

gustavopinto commented 11 years ago

Hey @rodrigoalvesvieira, I downloaded your branch code and realized that value of lastPushedAt var is something like 2013-02-16T19:01:41Z. If you look carefully, you'll realize that the date format is very strange. It has some T and Z chars, that does not say anything about the date. And, accordingly to the SimpleDateFormate documentation, Z has a a special meaning, and T has no meaning at all. So, I just remove T and Z from var, and code works fine :-)

But, others points:

We can discuss all these points tomorrow.