Fetch all attachments using xhr/fetch in the Connector and save directly to Zotero and Server

adomasven commented 6 months ago

Seems like when the Connector was initially written there was no XHR support for Blob and ArrayBuffer types. Technically it's not very efficient to do this, but we are facing increasingly more complicated bot protection and it's unlikely to get better in the future.

dstillman commented 6 months ago

(Inspired by https://forums.zotero.org/discussion/114431/pdf-will-not-save-to-zotero, among others)

dstillman commented 5 months ago

https://forums.zotero.org/discussion/comment/465245/#Comment_465245

adomasven commented 3 months ago

https://forums.zotero.org/discussion/116557/zotero-ezproxy-issue

adomasven commented 1 month ago

While this works well for regular downloads, for some sites (ScienceDirect!) that use a JS redirect it will not without custom handling:

Add a hidden iframe on the page with sandbox settings that disallows file downloads (so the user doesn't get prompted to save a file)
Monitor for attempts to navigate to the expected mime-type (either navigation gets aborted after getting headers with content-disposition by iframe sandbox, or it succeeds, which means we needlessly load the file in the iframe)
Then refetch the same page using XHR

Unfortunately at least for ScienceDirect on Safari this doesn't work without some additional custom handholding, because we do not have full cookie access there. We need to make the fetch request for the PDF from the sciencedirect page (not background) as it's allowed per their CSP policy, but this can change at any point.

The biggest drawback here is that we're not equipped to display a captcha window on the browser if anything goes wrong, and even if we opened a new captcha tab, we wouldn't be able to do much with it since we cannot grab the loaded file directly from the browser without using XHR. Also it might trigger a save file prompt if we did this.

Also if some of this breaks we will be hostage to the browser extension approval processes and cannot update as fast as we can with the client.

The other option is to continue using Zotero BrowserDownload for pages where we need JS redirect, but that defeats one of the more exciting parts of this change.

adomasven commented 1 month ago

Unfortunately at least for ScienceDirect on Safari this doesn't work without some additional custom handholding, because we do not have full cookie access there. We need to make the fetch request for the PDF from the sciencedirect page (not background) as it's allowed per their CSP policy, but this can change at any point.

I guess the default policy for all attachment XHRs should be to attempt to fetch them using the content XHR and only fallback to background XHR if that fails (most likely due to CORS), and then that might fail too on Safari, but at least we won't need to add an exception.

This would also mean that maybe Safari would work better than now since cookies would be sent more often. On the other hand for multiple saves from Google Scholar and similar websites we'd be needlessly sending content XHR that would all generally fail due to CSP.

zotero / zotero-connectors

Fetch all attachments using xhr/fetch in the Connector and save directly to Zotero and Server #474