internetarchive / iari

Import workflows for the Wikipedia Citations Database
GNU General Public License v3.0
11 stars 9 forks source link

As a programmer i want the /article endpoint to include template information in each reference element in the reference array so that proper assessment of reference status can be made #886

Closed mojomonger closed 1 year ago

mojomonger commented 1 year ago

Currently reference details, including template parameters, are acquired on an "as needed" basis. This does not allow filtering of the references by template information "up front", that is, upon first receiving the data returned by the /article endpoint.

With the way it is now, a solution involves looping thru all references and fetching the template information FOR EACH ONE, and appending the new template information to EACH reference element. This has a couple of drawbacks:

  1. it requires LOTS of HTTP calls over the data lines

  2. it adds complexity to fetching of the reference data that is needed "quick". - someone new to the IARI environment could be troubled as to why they had to re-fetch details for the references - it is much easier if all data is delivered "up front" in one package, especially with frameworks like React, which re-renders upon new data fetches.

I propose the following for the /article/references * property:

references: [ 
{
    id: ". . .",
    wikitext: ". . .",
    type: "footnote",
    subtype: "content",  **
    template_names: ["...", ],
    templates: [ { <template parameters here> }, ],  //
    flds: [. . .],
    urls: [. . .],
    titles: [. . .],  ***
    section: "...",
    name: "..."
},

* yes, i propose changing the property name back to "references" ** i propose we also change this field name from "footnote_subtype" to just "subtype" *** i'm not sure we need the "titles" array - we could extract them from the templates array - willing to discuss

We can also consider adding a parameter "dehydrated=true" to get lighter information, but that can be determined later.

I would like to add the template information to the /article endpoint, and see how big the payloads are. if they are reasonable, great.

Also, if we are concerned with people overloading the servers with requests, we could always add an authcode similar to what IABOT does with testdeadlink.

Please respond and comment!

dpriskorn commented 1 year ago

Thank you for the very clear issue. What about keeping the current functionality with dehydration being done as default and add a new parameter dehydrated=false which give you all the details about each reference?

This alleviates the need for the references endpoint which can be deprecated at the same time. @mojomonger WDYT? In case you want to go ahead with deprecation of the references endpoint I suggest we create a story for that too. I would also prefer to have separate stories for renaming of each field, it makes for a better release history and is better for keeping an overview over time of what changes were made.

dpriskorn commented 1 year ago

Done. Please open new issues as detailed above if you want key names changed, output removed, etc.

mojomonger commented 1 year ago

The output from the /article endpoint, https://archive.org/services/context/iari/v2/statistics/article, does not appear to include the templates parameters as described above.

for example:

https://archive.org/services/context/iari/v2/statistics/article?url=https://en.wikipedia.org/wiki/Easter_Island&regex=bibliography|further%20reading|works%20cited|sources|external%20links&refresh=true

dpriskorn commented 1 year ago

Could you share a screenshot also so I can see exactly what you mean?

mojomonger commented 1 year ago

dennis - here is the way the template informatio is provided with the /reference endpoint:

image

The thing to do here is to add the same information within EACH reference in the reference array in the /article endpoint.

references: [ 
{
    id: ". . .",
    wikitext: ". . .",
    type: "footnote",
    subtype: "content",  **
    template_names: ["...", ],
    templates: [ array of template parameter objects here ]
    flds: [. . .],
    urls: [. . .],
    titles: [. . .],  ***
    section: "...",
    name: "..."
}, 
...
]

make sense?

mojomonger commented 1 year ago

We can improve and strip out what we dont need later

dpriskorn commented 1 year ago

Oh, you probably didn't read the updated readme. You have to pass dehydrate=false to get the whole thing.

mojomonger commented 1 year ago

got it. thx