webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
https://browsertrix.com
GNU Affero General Public License v3.0
172 stars 32 forks source link

[Feature]: Be more verbose if QA functionality fails to load #1728

Open rien333 opened 5 months ago

rien333 commented 5 months ago

What change would you like to see?

When attempting to review a particular page from the QA tab, I would like to be shown why the QA UI fails to load whenever it fails to do so (as indeed may happen, see below). This helps me with implementing the proper fix.

Currently, when connecting a client to a local server that runs btrix (without using https), I first see:

image

So far, so good. However, when subsequently clicking the first page for QA, I'm greeted with a blank screen:

image

I think it would be nicer if I would be shown an error message similar to this one:

image

Possible cause behind failure

I think the reason for the QA functionality failing to display relates to the client neither doubling as the localhost, nor being served the browsertrix frontend through https (as in the error above). My reason for this hunch is that QA works just fine on the machine ("server") that actually hosts browsertrix.

On a side note: is https absolutely necessary for the QA functionality to work properly? https seems a bit overkill for our use case, unfortunately (a medium-sized local archiving institution that internally hosts browsertrix as a webservice)

Context

Running version 1.10 (beta.1). Tested with clients using firefox, chrome, and gnome web (webkit).

Logs

I can collect more logs, but here's the two errors the firefox browser console shows me upon opening a page for QA:

GET
http://172.20.97.64:30870/orgs/rar/items/crawl/manual-20240423133251-3beb085c-564/review/screenshots?qaRunId=&itemPageId=931115ee-53a1-4800-996f-241498e8ce7c
[HTTP/1.1 404 Not Found 0ms]
Uncaught TypeError: navigator.serviceWorker is undefined
    Le org.c3deaa532cdab368.js:7655
    u lit-html.js:6
    g lit-html.js:6
    _$AI lit-html.js:6
    p lit-html.js:6
    g lit-html.js:6
    _$AI lit-html.js:6
    j lit-html.js:6
    update lit-element.js:6
    update index.esm.js:36
    performUpdate reactive-element.js:6
    scheduleUpdate reactive-element.js:6
    _$EP reactive-element.js:6
    requestUpdate reactive-element.js:6
    set reactive-element.js:6
    updateOrg org.c3deaa532cdab368.js:9795
    l tslib.es6.js:118
    promise callback*a tslib.es6.js:120
    n tslib.es6.js:121
    n tslib.es6.js:117
    updateOrg org.c3deaa532cdab368.js:9795
    willUpdate org.c3deaa532cdab368.js:9795
    n tslib.es6.js:121
    n tslib.es6.js:117
    willUpdate org.c3deaa532cdab368.js:9795
    performUpdate reactive-element.js:6
    scheduleUpdate reactive-element.js:6
    _$EP reactive-element.js:6
    requestUpdate reactive-element.js:6
    set reactive-element.js:6
    O lit-html.js:6
    _$AI lit-html.js:6
    p lit-html.js:6
    g lit-html.js:6
    _$AI lit-html.js:6
    p lit-html.js:6
    g lit-html.js:6
    _$AI lit-html.js:6
    j lit-html.js:6
    update lit-element.js:6
    performUpdate reactive-element.js:6
    scheduleUpdate reactive-element.js:6
    _$EP reactive-element.js:6

Thanks for this feature btw! Crawl QA is something I know a lot of local archiving institutions are after, but have so far failed to satisfactorily implement.

tw4l commented 5 months ago

Hi @rien333 , thanks for this report! QA is very much an in-development/beta feature but it's great to see this interest in it :)

Perhaps @ikreymer can share a more detailed answer, but essentially https is necessary for ReplayWeb.page because of its use of service workers, and many of the QA UI features depend on ReplayWeb.page.

Absolutely agree it would be better to see a more descriptive error here!

ikreymer commented 5 months ago

@rien333 yes, we can improve the error message, but unfortunately, there's not much we can do at the moment about requiring https, since we rely on service workers for replay, and QA also uses replay, so w/o https, this features (and many others!) don't work. This is because a 'secure context', which is defined to be either localhost or an https origin, is required for service workers (and many other modern browser features), and this has been a controversial decision (see: https://github.com/w3c/webappsec-secure-contexts/issues/60 for example).

rien333 commented 5 months ago

Noted, and thanks for the detailed explanation!