opencitations / api

Software for creating REST API
1 stars 1 forks source link

Incorrect order of authors in OC Meta API results #13

Closed eliarizzetto closed 1 month ago

eliarizzetto commented 3 months ago

In the results of the metadata operation of OC Meta API, the order of the entities exposed in the author field does not match the correct order of the authors specified in the triplestore.

For example, the call https://opencitations.net/meta/api/v1/metadata/omid:br/0680773548 returns the authors of br/0680773548 in the following order: Bilgin, Hülya [orcid:0000-0001-6639-5533 omid:ra/0622032021]; Bozkurt, Merlin [omid:ra/06802276621]; Korfali, Gülsen [omid:ra/06802276623]; Yilmazlar, Selçuk [omid:ra/06802276622].

The authors of this resource are stored in the triplestore in a different order (specified by the oco:hasNext property): Bilgin, Bozkurt, Korfali, Yilmazlar (the positions of the last two authors is inverted in the API's result).

Examples like the one of br/0680773548 can be reproduced following the procedure below.


  1. First, we use the SPARQL endpoint to retrieve 20 sample BRs that have more than 4 authors, to be able to significantly compare the order of the authors in the triplestore and the API results.

    PREFIX pro: <http://purl.org/spar/pro/>
    
    SELECT ?br
    WHERE {
      {
        SELECT ?br
        WHERE {
          ?br pro:isDocumentContextFor ?authrole.
          ?authrole pro:withRole pro:author.
        }
        LIMIT 1000  # Adjust this limit based on your triplestore's performance
      }
      ?br pro:isDocumentContextFor ?authrole.
      ?authrole pro:withRole pro:author.
    }
    GROUP BY ?br
    HAVING (COUNT(?authrole) > 5)
    LIMIT 20
  2. We get the following result:

    br
    1 https://w3id.org/oc/meta/br/0680773548
    2 https://w3id.org/oc/meta/br/0680773565
    3 https://w3id.org/oc/meta/br/0680773578
    4 https://w3id.org/oc/meta/br/06230222748
    5 https://w3id.org/oc/meta/br/06230222763
    6 https://w3id.org/oc/meta/br/06230222796
    7 https://w3id.org/oc/meta/br/06230222798
    8 https://w3id.org/oc/meta/br/0680773030
    9 https://w3id.org/oc/meta/br/0680773514
    10 https://w3id.org/oc/meta/br/0680773573
    11 https://w3id.org/oc/meta/br/06230222751
    12 https://w3id.org/oc/meta/br/06230222768
    13 https://w3id.org/oc/meta/br/0680772983
    14 https://w3id.org/oc/meta/br/0680773000
    15 https://w3id.org/oc/meta/br/0680773017
    16 https://w3id.org/oc/meta/br/0680773584
    17 https://w3id.org/oc/meta/br/06230222747
    18 https://w3id.org/oc/meta/br/06230222756
    19 https://w3id.org/oc/meta/br/06230222766
    20 https://w3id.org/oc/meta/br/06230222799
  3. Then we pick any of the BRs in the result (in this instance, the first one, br/0680773548) and retrieve via SPARQL endpoint the details about its authors: the OMID of the agent role; the OMID of the responsible agent; the surname of the agent; and the object of the oco:hasNext property, which determines the order of the authors, or rather of their roles.

    PREFIX pro: <http://purl.org/spar/pro/>
    PREFIX oco: <https://w3id.org/oc/ontology/>
    PREFIX meta: <https://w3id.org/oc/meta/>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    
    SELECT ?author_role ?ra ?surname ?next{
     <https://w3id.org/oc/meta/br/0680773548> pro:isDocumentContextFor ?author_role.
      ?author_role pro:withRole pro:author;
        pro:isHeldBy ?ra.
      ?ra foaf:familyName ?surname.
      OPTIONAL{?author_role oco:hasNext ?next.}
    }
  4. We obtain the following result:

    author_role ra surname next
    1 https://w3id.org/oc/meta/ar/06803250813 https://w3id.org/oc/meta/ra/0622032021 Bilgin https://w3id.org/oc/meta/ar/06803250814
    2 https://w3id.org/oc/meta/ar/06803250814 https://w3id.org/oc/meta/ra/06802276621 Bozkurt https://w3id.org/oc/meta/ar/06803250815
    3 https://w3id.org/oc/meta/ar/06803250815 https://w3id.org/oc/meta/ra/06802276622 Yilmazlar https://w3id.org/oc/meta/ar/06803250816
    4 https://w3id.org/oc/meta/ar/06803250816 https://w3id.org/oc/meta/ra/06802276623 Korfali
  5. We query the Meta REST API for the same BR as step 3 (br/0680773548): https://opencitations.net/meta/api/v1/metadata/omid:br/0680773548, getting this result:

    [
    {
       "volume":"18",
       "author":"Bilgin, Hülya [orcid:0000-0001-6639-5533 omid:ra/0622032021]; Bozkurt, Merlin [omid:ra/06802276621]; Korfali, Gülsen [omid:ra/06802276623]; Yilmazlar, Selçuk [omid:ra/06802276622]",
       "publisher":"Elsevier Bv [crossref:78 omid:ra/0610116009]",
       "editor":"",
       "id":"doi:10.1016/j.jclinane.2005.12.014 openalex:W2127410217 pmid:16731339 omid:br/0680773548",
       "venue":"Journal Of Clinical Anesthesia [issn:0952-8180 openalex:S155967237 omid:br/0621013884]",
       "page":"243-244",
       "pub_date":"2006-05",
       "type":"journal article",
       "issue":"3",
       "title":"Sudden Asystole Without Any Alerting Signs During Cerebellopontine Angle Surgery"
    }
    ]
  6. As we can observe comparing the result of the SPARQL endpoint and the one of the API, the order of the authors differs: in particular, the positions of the last two authors are inverted (Korfali, i.e. ra/06802276623, should be the last one, as its role is not linked to any other role by the oco:hasNext property, and should be preceded by Yilmazlar, i.e. ra/06802276622, since there is a triple specifying that ar/06803250815 oco:hasNext ar/06803250816).

arcangelo7 commented 1 month ago

I have resolved the problem with the ordering of roles in the OC Meta API results. The fix ensures that the authors (editors and publishers) are now correctly ordered according to the oco:hasNext property in the triplestore.

Previously, I was capturing the order by sorting the roles in descending order based on the number of oco:hasNext edges and then aggregating the results. This approach worked correctly with Blazegraph. However, after migrating to Virtuoso, this method stopped working as expected, and I couldn't determine the exact cause of the discrepancy.

To address this, I've shifted the ordering logic from the SPARQL query to Python. The solution now involves modifying the SPARQL query to capture the full chain of oco:hasNext relationships and updating the Python code to process this information correctly. This change allows us to accurately reconstruct the intended author order, regardless of the underlying triplestore implementation.

The fix has been implemented and is now live. You can find the details of the implementation in this commit: https://github.com/opencitations/api/commit/57ce162597a96e396c70e8cd604e1f50b9161a66