research-software-directory / RSD-as-a-service

This repo contains the new RSD-as-a-service implementation
https://research.software
23 stars 15 forks source link

Option to export RSD data #883

Open Peter9192 opened 1 year ago

Peter9192 commented 1 year ago

As a project partner, I'd like to have an export button on the RSD that produces a csv file, so I can easily collect all project outputs for the final report.

RolfHut commented 1 year ago

as an annoying entrepeneuring project partner, I asked chat-GPT to write a python parser to turn the API output for a given project as a formatted text output. Maybe useful as a start-point:

import requests
import json
from operator import itemgetter

# Make the API request
url = "https://research-software-directory.org/api/v1/project?slug=eq.ewatercycle-ii&select=*,impact_for_project(mention(*)),output_for_project(mention(*))"
response = requests.get(url)
data = response.json()

# Extract the output mentions
output_mentions = data[0]['output_for_project']

# Extract and sort the authors' information
mentions_with_authors = []
for mention in output_mentions:
    mention_info = mention['mention']
    authors = mention_info['authors']
    if authors:
        mention_info['authors'] = authors.split(", ")
        mentions_with_authors.append(mention_info)

sorted_mentions = sorted(mentions_with_authors, key=lambda m: m['authors'])

# Format the output mentions with authors as a nicely formatted text
output_text = ""
for mention in sorted_mentions:
    title = mention['title']
    url = mention['url']
    mention_type = mention['mention_type']
    version = mention['version']
    authors = ", ".join(mention['authors'])
    output_text += f"Title: {title}\n"
    output_text += f"Authors: {authors}\n"
    output_text += f"URL: {url}\n"
    output_text += f"Mention Type: {mention_type}\n"
    if version:
        output_text += f"Version: {version}\n"
    output_text += "\n"

# Print or use the formatted text
print(output_text)

(I did ask it to sort by author last name, it clearly doesn't know what a 'last name' is)

RolfHut commented 1 year ago

Slightly more serious: as a project partner I want the output of such a parser to be ready for use in (end of project) reports, so comply with standards for these reports. For academic papers, this would look something like:

Hut, R., Drost, N., van de Giesen, N., van Werkhoven, B., Abdollahi, B., Aerts, J., Albers, T., Alidoost, F., Andela, B., Camphuijsen, J., Dzigan, Y., van Haren, R., Hutton, E., Kalverla, P., van Meersbergen, M., van den Oord, G., Pelupessy, I., Smeets, S., Verhoeven, S., de Vos, M., and Weel, B.: The eWaterCycle platform for open and FAIR hydrological collaboration, Geosci. Model Dev., 15, 5371–5390, https://doi.org/10.5194/gmd-15-5371-2022, 2022.

It might actually be a good idea to export to bibtex, which most people can than use TeX software to format in any output style they want.

ewan-escience commented 1 year ago

@RolfHut I don't know how fast you want this (paging @dmijatovic), but it might take a while before we get to this.

In the meanwhile, you can easily get BibTeX for every output that has a DOI in the following way: Send an HTTP GET request to https://doi.org/<your-doi> with an extra header with key Accept and value application/x-bibtex. For example making such a request to https://doi.org/10.5194/gmd-15-5371-2022 yields

@article{Hut_2022,
    doi = {10.5194/gmd-15-5371-2022},
    url = {https://doi.org/10.5194%2Fgmd-15-5371-2022},
    year = 2022,
    month = {jul},
    publisher = {Copernicus {GmbH}},
    volume = {15},
    number = {13},
    pages = {5371--5390},
    author = {Rolf Hut and Niels Drost and Nick van de Giesen and Ben van Werkhoven and Banafsheh Abdollahi and Jerom Aerts and Thomas Albers and Fakhereh Alidoost and Bouwe Andela and Jaro Camphuijsen and Yifat Dzigan and Ronald van Haren and Eric Hutton and Peter Kalverla and Maarten van Meersbergen and Gijs van den Oord and Inti Pelupessy and Stef Smeets and Stefan Verhoeven and Martine de Vos and Berend Weel},
    title = {The {eWaterCycle} platform for open and {FAIR} hydrological collaboration},
    journal = {Geoscientific Model Development}
}

For output without a DOI, you'd indeed have to do some work yourself or ask AI ;)

jmaassen commented 1 year ago

It may be interesting to provide a script like this as part of a toolbox that can be used to harvest information from the RSD? I've been using a similar approach to gather statistics about the software in the RSD.