openintegrity / openintegrity-metrics

Discussing, designing and building the next steps for the open integrity index.
0 stars 0 forks source link

Develop data model for open repository of software metrics #1

Open jmatsushita opened 9 years ago

jmatsushita commented 9 years ago

We want to aggregate existing metrics and develop new metrics for software projects. Which data structure do we need? Do we go for API first design (apiary.io)? A data schema (JSON Schema? LD?)? Data Cube? In some ways this is a data catalogue (so maybe CKAN and the DCAT ontology are relevant) but in other ways it's more detailed and granular (with a focus at the measurement level).

Here's an ongoing effort to list potential metrics.

Here are a number of questions that relate to this:

jmatsushita commented 9 years ago

Submitted a CSV version of the list e28553874c7b78fd98a66945422846b274de7069

From #5, #3 maybe

{
  "metric_id": "project/package/dependencies",
  "metric_provider": "https://libraries.io",
  "metric_hosting": "cached", // other types could be "remote" to indicate that the data is not hosted by OII or "stream" to indicate that OII can serve/proxy a realtime stream. "hosted" would mean that OII is also hosting historical data (then the date range should also be there).
  "metric_type": "calculated", // provenance needs to be probably more detailed than that.
  "metric_source_start_range": "2014-05-02T10:10:00",
  "metric_source_end_range": "2015-08-20T19:45:00",
  "metric_source_api_endpoint": "https//libraries.io/api/{project_id}/dependencies",
  "metric_source_api_parameter_project_id": "project_id",
  "metric_source_api_response_path_value": "response.data.value"
}

I think that the interesting bit is going to be with provenance. For dependencies for instance, specifying where is the raw data (like the package manager's fikes), which bit of code processes it.