harvard-lil / perma

Indelible links
408 stars 72 forks source link

Add error handling for individual captures #3531

Closed tinykite closed 3 weeks ago

tinykite commented 1 month ago

What this does

This update renders formatted errors for attempts to capture a perma link that fail, for multiple reasons.

Most notably, it adds a CaptureError component, as well as additional utility functions.

Implementation requirements

I scoped out these error handling requirements for single link capture failures:

Generic Error Handling

Special Error Handling

Intentional changes

Jquery -> async/await

The biggest change in implementation is the shift from Jquery error handling to more modern async/ await. In legacy jQuery (JQXHR) error handling, it's my understanding you deal with error handling for asynchronous operations using callbacks. The JQXHR object is used to handle the response and error callbacks. Specifically:

With async/await, error handling is typically a lot simpler to reason about, requires less manual checks, and is done using try/catch blocks. For example:

In implementation:

Renamed variables

I did rename some variables for readability. One of the things I renamed for clarity and readability is offer_upload. This has been renamed to showUploadLink.

Revised error message

For an error that is thrown with a 401, there was a requirement to set the error message to <a href='/login'>You appear to be logged out. Please click here to log back in</a>. I rephrased this slightly, per common accessibility recommendations to avoid using language that says "click here". The updated phrasing includes a link with link text that reads login.

API error handling separated from UI updates

There is a component, CaptureError that handles hiding or showing specific help text and/or links based on the specific error associated with a capture attempt. This is kept completely separate from the async/await and try/catch logic associated with API handling. Previously, functions colocated attempting to handle API-related errors and handle the logic of what UI to display as a result of those errors.

Screenshots

These are all the many errors that should be possible.

1. When a user is out of perma links:

image

2. When a user inputs an invalid url:

Screenshot 2024-06-04 at 2 12 44 PM

3. When user hasn't chosen a destination folder:

image

4. When a request to start a capture fails with a response code of 400 and the API-provided error response message includes "Error 0":

Screenshot 2024-06-04 at 2 20 53 PM

5. When a request to start a capture fails with a response code of 401:

image

6. When a request to start a capture fails with a proper API response, with a response code that is not 400 or 401. In this case, the error message includes the numerical status of the response:

image

7. When a request to start a capture succeeds. But at some point after Scoop attempts to begin capture, the API response about the status of the capture indicates that it has failed.

image

Note: I tested this by adding poll_data['status'] = 'failure' to line 314 of celery_tasks.py, so that I could see perma url submission work, and then see capture polling initially begin to work, and then quickly fail.

8. When there is an unspecified error that happens either during the url submission process or the url capture process. (For example, a request times out, and there isn't a proper API response):

image

Note: I tested this out by commenting-out the API request logic within the try block in handleCaptureStatus and including throw new Error() instead, to test-out what would happen if there was an error but no API response.

Outstanding Questions

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 69.23%. Comparing base (913eeba) to head (4845957). Report is 1 commits behind head on develop.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #3531 +/- ## ======================================== Coverage 69.23% 69.23% ======================================== Files 48 48 Lines 6816 6816 ======================================== Hits 4719 4719 Misses 2097 2097 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

tinykite commented 4 weeks ago

is this an accurate restatement of this question?

"if the initial POST worked, and now we are polling/making lots of GET requests to see how things are going... and we randomly get something other than a 200 back, what should we do?"

Thanks for asking me to clarify!

I am specifically wondering if you think it's important for any function in Perma's frontend that handles GET requests to return a status code along with an error message, in the event there is an error. This would match how we handle the current POST request error handling (we display different errors depending on if the status is a 400, 401, or 500, or etc) but would just require a bit more work so I figured I'd ask — and make sure to implement that consistently everywhere if so.

For example, in our current useFetch composable, the function will return loading, error, errorMessage and data reactive properties for any individual fetch request. I can add a statusCode property if that would be desirable.

I can also update the handleCaptureStatus function to return one as well, for example in a format like this:

        return {
            step_count: job.step_count,
            status: job.status,
            statusCode: response.status \\ We aren't currently returning this
            error: job.status === 'failed' ? job.message : ''"
        }

The alternative implementation in this PR right now is that for errors related to fetching the status of an active job we either:

Either approach works for me, I am not strongly opinionated.

rebeccacremona commented 4 weeks ago

I am specifically wondering if you think it's important for any function in Perma's frontend that handles GET requests to return a status code along with an error message, in the event there is an error.

Thanks for clarifying!

I think it might depend on the situation.

In this instance, if we're polling for an in-progress capture's status and we get anything other than a 200 back... I think we should probably ignore it, and keep on polling. If that were to happen, if wouldn't communicate any information about the capture's status, it could only mean something is wrong at the operational level, like, the whole API is suddenly down, or... I guess.... you somehow logged out and expired your session while the polling was in progress? Or, more likely, a transient network blip (of 502/504) and your next request will be fine? I'm not sure, in this case, what makes the most sense. But, it definitely wouldn't be a sign of a capture failure.

I could imagine other GETs being different. Like, if you get a 403 or a 404 when you try to open a folder (which shouldn't ordinarily happen, but in various circumstances, sometimes has), the current frontend just... breaks. I could imagine handling that situation differently, perhaps with an error message 😂

So, I'm not sure there is a generic answer.

rebeccacremona commented 4 weeks ago

It's often necessary to check if a jqXHR object exists before accessing properties (with async/await, developers can work directly with the response object.)

This is just informational: I don't think that generally the case.

I think our old JS checks for the presence of the jqXHR inside linkFailed because linkFailed is written such that it can be called in two different contexts, one, as the callback response for the POST to create a link(in which case a jqXHR will be passed to it, since that's how the callbacks work), two, called directly by us in our code with no args during polling when the API reports that the job failed.

I don't think that changes anything, but just in case that helps your understanding 🙂.

(Note that I make no claims about the readability of said code 😉)

rebeccacremona commented 4 weeks ago

Ha!!

I'm not sure, in this case, what makes the most sense. But, it definitely wouldn't be a sign of a capture failure.

The current code seriously just says "TODO" here https://github.com/harvard-lil/perma/blob/729ff801a3b82035b4d90ba93a4e3a3b587e5af5/perma_web/static/js/create-link.module.js#L102

tinykite commented 4 weeks ago

Ha!!

The current code seriously just says "TODO" here

I'm not opposed to a "max attempts" approach. For example, if we are just polling the status of a job that hasn't returned an actual failure yet (eg, the status of the job itself, rather than the status of the request is officially a failure), we ignore non-200 errors that happen in the midst of regular polling up until a point. If we hit the max attempts, we stop polling. And show either a backend message if it exists or show a "We're sorry, we've encountered an error processing your request".

But additionally, if you have opinions about something simpler (for example, we just ignore the failed fetch unless we can confirm the job has, in fact, failed as you suggested) that works too.

rebeccacremona commented 3 weeks ago

I'm not opposed to a "max attempts" approach. But additionally, if you have opinions about something simpler (for example, we just ignore the failed fetch unless we can confirm the job has, in fact, failed as you suggested) that works too.

Into it; that sounds like the right design. Maybe the "temporarily unavailable" copy, if max is exceeded?

But totally, given the status quo is to ignore, and has been that way for years, I certainly think continuing to ignore for the time being is a-ok!

rebeccacremona commented 3 weeks ago

Part 2 will follow which will be: I'll go buzz through the existing code and provide a second set of eyes, to see if there are any error conditions that aren't handled here.

Alrighty! I did this, and, I think you caught almost everything 🎉 !

The only thing I could find handled by the current code that is not handled here, is, when the response is not JSON. That of course never happens if our API properly sends a response as intended, but we could get a non-JSON error response under a variety of circumstances: 502 HTML from Cloudflare, 504 from nginx, non-200 status code sans body...

I think this code should probably be prepared for that situation: if there's no JSON, just use the status code.

Does that sound reasonable?

Can send a link to where the current code handles these situations, if that would be helpful.

Cheers,

tinykite commented 3 weeks ago

The only thing I could find handled by the current code that is not handled here, is, when the response is not JSON. That of course never happens if our API properly sends a response as intended, but we could get a non-JSON error response under a variety of circumstances: 502 HTML from Cloudflare, 504 from nginx, non-200 status code sans body...

I think this code should probably be prepared for that situation: if there's no JSON, just use the status code.

Does that sound reasonable?

That definitely sounds reasonable! I originally assumed it would be simpler to not try to differentiate between:

But I don't think the actual implementation adds much complexity. So in fa3d89a I wrote a new utility function, getErrorResponse and added an additional conditional check to the handleCaptureError function to take care of this.

Edit: I'm not sure I might the right assumption that this would be specific for 400/401/500 and other non-200 status code responses. If this is something it feels pertinent to implement for any response returned within handleArchiveRequest (not just those with status codes affiliated with an error), let me know, and I'll make sure to refactor for that as well.

tinykite commented 3 weeks ago

But totally, given the status quo is to ignore, and has been that way for years, I certainly think continuing to ignore for the time being is a-ok!

That makes sense — for the time being, this is also how I implemented it in the WIP draft I have for batch error handling (for fetches to get the details of batch jobs), so let's ignore it for now and we can revisit it as an additional smaller task in the nearish future!

rebeccacremona commented 3 weeks ago

If this is something it feels pertinent to implement for any response returned within handleArchiveRequest (not just those with status codes affiliated with an error), let me know, and I'll make sure to refactor for that as well.

Nope, I think that's right! 200 or 201 should definitely always come with JSON from our API.