internetarchive / iari

Import workflows for the Wikipedia Citations Database
GNU General Public License v3.0
11 stars 9 forks source link

As a data consumer i want the errors from IABot testdeadlink endpoint included in the /check-url json data #876

Closed mojomonger closed 1 year ago

mojomonger commented 1 year ago

something like this comes from https://iabot-api.archive.org/testdeadlink.php:

{
    "results": {
        "https:\/\/orb.binghamton.edu\/cgi\/viewcontent.cgi?article=1041": 400,
        "errors": {
            "https:\/\/orb.binghamton.edu\/cgi\/viewcontent.cgi?article=1041": "RESPONSE CODE: 400"
        }
    },
    "servetime": 1.2438
} 

the content of the "errors" field should be added to the /check-url json code, along with the testdeadlink_status_code property

dpriskorn commented 1 year ago

Wouldn't it be best to do this in the check-urls endpoint instead?

mojomonger commented 1 year ago

we do not yet have the check-urls endpoint yet, do we?

Also, even if we did, we still want the error_text added to the check-url json because that's the way IARE is currently using.

mojomonger commented 1 year ago

We want the iabot error code to be included in the /check-url json .

For instace, if the IABOT curl code gives us this:

curl -XPOST https://iabot-api.archive.org/testdeadlink.php \
-d $'urls=https://www.researchgate.net/publication/334000200' \
-d "authcode=579331d2dc3f96739b7c622ed248a7d3" \
-d "returncodes=1"
{
    "results": {
        "https:\/\/www.researchgate.net\/publication\/334000200": 403,
        "errors": {
            "https:\/\/www.researchgate.net\/publication\/334000200": "RESPONSE CODE: 403"
        }
    },
    "servetime": 0.1031
}

We want the /check-url json to be like this:

{
first_level_domain: "researchgate.net",
fld_is_ip: false,
url: "https://www.researchgate.net/publication/334000200",
scheme: "https",
. . .
status_code: 403,
testdeadlink_status_code: 403,
testdeadlink_error_text: "RESPONSE CODE: 403",
. . .
timestamp: 1687363563,
isodate: "2023-06-21T16:06:03.581035",
id: "f4ec1d8c"
}