Closed akavel closed 6 years ago
I have a feeling that this problem is happening in the ServiceWorker implementation. where it is failing to reroute requests properly. This could be due to either wrong/incomplete rerouting logic or not registering the ServiceWorker in the first place. Are you using FireFox in the private browsing mode? If so, that disables ServiceWorker. It was also disabled in FF 45 and FF 52 ESR versions.
I can replicate the issue in Chrome. A theory is that the SW is not parsing out the request url correctly (hence null
) then is forwarding the request for /memento/null/{URI-R} back to the replay system, which fails.
I modified the SW slightly to display more info:
request = reroute(event.request, referrerDatetime) // Only embedded resources
//console.log('REROUTING request for ' + event.request.url + ' to ' + request.url)
console.log('REROUTING request for ' + event.request.url + ' to ' + request.url + ' REF ' + event.request.referrer + ' ***')
//console.log('REROUTING request for ' + event.request.url + ' to ' + request.url + ' for ' + JSON.stringify(event.request, null, 4))
let m = event.request.referrer.match(/\/([0-9]{14})\//)
console.log(m)
if (m !== null) {
m = m[1]
}
console.log(m)
and at some point I'm starting to get log lines like below:
REROUTING request for http://serv.peterme.net/img/link.svg to http://localhost:5000/memento/null/http://serv.peterme.net/img/link.svg REF http://serv.peterme.net/styles.css
null
null
REROUTING request for http://serv.peterme.net/cmunbx.woff to http://localhost:5000/memento/null/http://serv.peterme.net/cmunbx.woff REF http://serv.peterme.net/cmun-serif.css
null
null
though in the same console earlier logs were ok:
REROUTING request for http://serv.peterme.net/styles.css to http://localhost:5000/memento/20171207224241/http://serv.peterme.net/styles.css REF http://localhost:5000/20171207224241/serv.peterme.net/cross-platform-guis-and-nim-macros.html
Array [ "/20171207224241/", "20171207224241" ]
20171207224241
or
REROUTING request for http://serv.peterme.net/styles.css to http://localhost:5000/memento/20171207224241/http://serv.peterme.net/styles.css REF http://localhost:5000/20171207224241/serv.peterme.net/cross-platform-guis-and-nim-macros.html
Array [ "/20171207224241/", "20171207224241" ]
20171207224241
Don't have idea what to debug further at this point.
The event
argument in self.addEventListener('fetch', function (event) {
within serviceWorker.js normally contains a URI-M with an embedded datetime. For example, the request for http://localhost:5000/20171207224241/serv.peterme.net/cross-platform-guis-and-nim-macros.html
is in the referrer attribute of the event passed in.
For some URI-Ms, like the ones @akavel listed, the referrer property of the event passed in is null.
@machawk1 Actually, the referrer seems not null, only without the datestamp (so referrerDatetime
is null). See the examples I posted just above (see the string after REF
).
I was looking at the logic how referrerDatetime
is extracted and I can see there is no fallback for situations when a referrer property is missing or does not match the RegEx pattern.
ServiceWorker referrer is a URI-M
ServiceWorker referred is a live web URI
Latter does not have a datetime to scrape out w/ the regex.
Is it possible that the referrer is stored in the header, being propagated to the replay system, then being used as the basis for replay?
I think I know the reason, but the solution is more involved. I have hinted about this issue and a potential solution in the SW we published last year in JCDL. The context of cascaded requests is set based on their parent resource, which can be fixed by issuing a fabricated client-side redirect to the resolved URI-M so that all the successive requests are in the right context.
Current SW implementation is very rudimentary, which does not account for many situations.
This problem will usually occur in requests that are not originated from the main HTML file, but from a secondary source such as an image or font file being requested from withing a CSS file that is included in the HTML page.
A quick and dirty solution for now would be to store last known referrerDatetime
in the localStorage
and use that when referrerDatetime
is null.
But the localStorage
based solution (or a global variable based solution for that matter) may cease to work as expected when multiple composite mementos are requested in a non-sequential manner.
We could check if the referrer is a URI-M and if not (in case like you described, @ibnesayeed), redirect to the /memento/*/URI-R
endpoint.
redirect to the
/memento/*/URI-R
endpoint.
This will not help, because this will return a list of mementos (if more than one captures are available), without a clue of which one to pick.
If we had the TimeGate endpoint (#105) functioning and we passed the Accept-Datetime of the root memento to the endpoint, we could use that as the basis for date resolution (and thus, URI-M).
If we had the TimeGate endpoint (#105) functioning and we passed the Accept-Datetime of the root memento to the endpoint, we could use that as the basis for date resolution (and thus, URI-M).
If you have the datetime of the root memento, then you don't really need any other end point. The problem here is, you don't have access to the datetime of the root memento by the time you make secondary level requests. As I said, a quick and dirty solution would be to store root memento's datetime in the localstorage or in a global variable and update it when you another root memento is requested, or use it when referrer does not have that info. The problem with this approach is, when a new root memento is requested before all resources of the previous composite memento are loaded, you will end up overwriting the datetime.
Does the possibility exist for intercepting the requests from embedded resources using the service worker?
Based on using a very similar method utilizing localStorage in the past, I think it would not be good to go that route.
For a more robust approach to maintain the same-origin boundary context, read the Methodology section of the Client-side Reconstruction of Composite Mementos Using ServiceWorker paper.
Does the possibility exist for intercepting the requests from embedded resources using the service worker?
Yes, that's what I am referring to in the paper and mentioned here a few times in comments already. The idea is to temporarily cache the response of the final URI-M and return a fabricated redirect response to the client. Then in the successive request, return the response from the cache which will have the right origin context.
No code in https://github.com/oduwsdl/reconstructive still. :|
The localStorage solution is not robust. Could you put together an example per the Methodology section that does what we need in actuality and not in theory?
Perhaps I should populate Reconstructive repo that will take care of this issue. Putting an example around this approach is almost half the work of writing whole Reconstructive logic. In the interim, you might just want to use a global variable (or localstorage) to mitigate this immediate issue. This approach is far from being good, but will do the trick until I push something more thoughtful in the other repo. I will try to spare some cycles for that tomorrow or over the weekend.
I think the reconstructive repo is in usable state now. I did not test it in complex situations yet, but those can be discovered and fixed later as we encounter them. Some documentation is certainly needed though. Since ipwb is not Docker-friendly yet, I don't have the environment set up to test it.
@akavel and @machawk1 you might want to test this again with the latest release. Hopefully it should be fixed now after the merger of PR #339, if not then report new findings.
Looks good to me now, thanks! 😄
On first look, there are still some requests in the Firefox console reported as external (non-localhost), but when I take a look at the response details, they show "InterPlanetary Wayback Replay/...", so this makes me feel good and safe now :)
Thanks a lot!!! :) :) :)
@akavel Thank you for circling back to this and confirming the fix. 😄
With the attached .warc.gz and the attached .cdxj (zipped), when opening http://localhost:5000/20171207224241/serv.peterme.net/cross-platform-guis-and-nim-macros.html, not all resources are loaded from localhost. Some of them are still pulled from the Web, though they seem to be present both in the .warc and in the .cdxj. From the Firefox console, those seem to be:
which seems to roughly match the "memento/null/..." ones in the log below:
I believe this may be coming from serviceWorker.js, though I'm not 100% sure.