Open biancini opened 5 years ago
Let's have the crawler query the ES and output the results in a JSON file. Such a file will be public in a directory served by nginx. See https://github.com/italia/developers.italia.it/issues/406 as a reference to such a public dir.
@biancini we could close this issue since the point was achieved by https://github.com/italia/developers.italia.it/issues/406 What do you think?
@sebbalex I believe the solution that is in use right now is not an API so I would leave this open for future improvements. I still believe it could be nice to have an actual and proper API for this.
Moving the issue to developers-italia-api
This should be doable now with something like:
import json
from collections import defaultdict
import requests
import yaml
API_BASE_URL = "https://api.developers.italia.it/v1"
def get_paginated(resource: str):
items = []
page = True
page_after = ""
while page:
res = requests.get(f"{API_BASE_URL}/{resource}?all=true&{page_after}")
res.raise_for_status()
body = res.json()
items += body["data"]
page_after = body["links"]["next"]
if page_after:
# Remove the '?'
page_after = page_after[1:]
page = bool(page_after)
return items
software = get_paginated("software")
publishers = get_paginated("publishers")
by_date = defaultdict(
lambda: {
"num_software_pa": 0,
"num_software_thirdparty": 0,
"num_administrations": 0,
}
)
for s in software:
date = s["createdAt"][:10]
try:
publiccode = yaml.safe_load(s["publiccodeYml"])
if publiccode.get("it", {}).get("riuso", {}).get("codiceIPA"):
by_date[date]["num_software_pa"] += 1
else:
by_date[date]["num_software_thirdparty"] += 1
except:
pass
administrations = set()
for publisher in publishers:
if publisher.get("alternativeId"):
administrations.add(publisher["id"])
date = publisher["createdAt"][:10]
by_date[date]["num_administrations"] = len(administrations)
print(json.dumps([{date: counts} for date, counts in by_date.items()], indent=4))
but I wouldn't turn into an endpoint into the API, as the data is easily available without hardcoding the metrics.
To integrate the work ongoing on metric creation for Developers /Italia, it would be great to have an API (talking JSON) that shows the following data:
If the crawler has the data, it would be great to have this JSON API also proposing the evolution of these numbers over time (since the beginning of Developers /Italia). The output could be of this form: