research-software-directory / RSD-as-a-service

This repo contains the new RSD-as-a-service implementation
https://research.software
23 stars 15 forks source link

1055 - show reference papers and scraped citations #1061

Closed dmijatovic closed 9 months ago

dmijatovic commented 9 months ago

Show reference papers and scraped citations

Closes #1055

Changes proposed in this pull request:

How to test:

Reference papers on software page

image

Reference papers on edit software page

image

Mentions on edit software page

image

Mentions count on software page (consolidated count of mentions and citations)

image

Mentions count in the software card (and order on mention count)

image

PR Checklist:

sonarcloud[bot] commented 9 months ago

[rsd-database] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarcloud[bot] commented 9 months ago

[rsd-frontend] Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

43.8% 43.8% Coverage
0.0% 0.0% Duplication

jmaassen commented 9 months ago

Tried different tests, and seems to work well. Some comments / thoughts:

I do see some issues with the harvested information. There seems to be quite few (older) publications that do not have a DOI yet. For example:

Screenshot 2023-12-01 at 13-35-18 Edit software Research Software Directory

The thesis' on the bottom of this list don't have a DOI (that is expected). I would expect the journal papers on the top to have one though? As far as I know, many older paper where retroactively assigned a DOI. Also, the top one does not seem to have a link associated with it, while the second one does have a link, but it's broken.

Interestingly, the web page of the reference paper (cpe08) seems to refer to a different Marvin paper (10.1016/j.websem.2009.09.002) which is also found by our scraper. They don't seem to find the DOI-less one though.

Sometimes I also see multiple versions of papers:

Screenshot 2023-12-01 at 13-40-40 Edit software Research Software Directory

And some have issues with international characters:

Screenshot 2023-12-01 at 10-54-59 Edit software Research Software Directory

This is the expected output:

Screenshot 2023-12-01 at 13-56-26 Clasificación multi-etiqueta utilizando computación distribuida

Link to the paper

Overall, I think this is a great feature, but we may run into some data duplication and quality issues ;-)