Open Morred opened 1 year ago
This is the error message that bubbles up:
2023-06-16T12:09:02.777904+00:00 app[web.1]: {
2023-06-16T12:09:02.777905+00:00 app[web.1]: message: 'This file is too large to be exported.',
2023-06-16T12:09:02.777905+00:00 app[web.1]: stack: 'Error: This file is too large to be exported.\n' +
2023-06-16T12:09:02.777906+00:00 app[web.1]: ' at Gaxios._request (/app/node_modules/gaxios/build/src/gaxios.js:129:23)\n' +
2023-06-16T12:09:02.777907+00:00 app[web.1]: ' at runMicrotasks (<anonymous>)\n' +
2023-06-16T12:09:02.777907+00:00 app[web.1]: ' at processTicksAndRejections (node:internal/process/task_queues:96:5)\n' +
2023-06-16T12:09:02.777907+00:00 app[web.1]: ' at async JWT.requestAsync (/app/node_modules/google-auth-library/build/src/auth/oauth2client.js:343:18)\n' +
2023-06-16T12:09:02.777908+00:00 app[web.1]: ' at async fetchHTMLForId (/app/server/docs.js:69:18)\n' +
2023-06-16T12:09:02.777908+00:00 app[web.1]: ' at async Promise.all (index 0)\n' +
2023-06-16T12:09:02.777908+00:00 app[web.1]: ' at async fetch (/app/server/docs.js:94:24)\n' +
2023-06-16T12:09:02.777908+00:00 app[web.1]: ' at async exports.fetchDoc (/app/server/docs.js:41:20)\n' +
2023-06-16T12:09:02.777909+00:00 app[web.1]: ' at async handleCategory (/app/server/routes/categories.js:68:47)',
2023-06-16T12:09:02.777909+00:00 app[web.1]: response: {
2023-06-16T12:09:02.777909+00:00 app[web.1]: config: {
2023-06-16T12:09:02.777912+00:00 app[web.1]: url: 'https://www.googleapis.com/drive/v3/files/1BqyfQAGbelprPuN8kOXUKpsPfg4qsLoorXAPwdM4Slc/export?mimeType=text%2Fhtml',
2023-06-16T12:09:02.777913+00:00 app[web.1]: method: 'GET',
2023-06-16T12:09:02.777913+00:00 app[web.1]: paramsSerializer: [Function (anonymous)],
2023-06-16T12:09:02.777913+00:00 app[web.1]: headers: {
2023-06-16T12:09:02.777913+00:00 app[web.1]: 'x-goog-api-client': 'gdcl/3.2.2 gl-node/16.20.0 auth/6.1.6',
2023-06-16T12:09:02.777913+00:00 app[web.1]: 'Accept-Encoding': 'gzip',
2023-06-16T12:09:02.777913+00:00 app[web.1]: 'User-Agent': 'google-api-nodejs-client/3.2.2 (gzip)'
Hey @Morred, thanks for the issue. We've recently been seeing the same issue on a few of our documents as well, and have been looking into workarounds. If you have a working proof of concept fix you can share or are able to make a PR, that would be much appreciated!
Will do once I have something that works!
Very interesting, we are seeing this exact issue as well. Attempting fixes by splitting many documents in half, re-sizing images etc. A cleaner fix would be desirable however!
It seems like Google has fixed things on their end since yesterday or so, and all the pages that weren't loading before for us are now loading again. Can anyone else here confirm that it's the same for them?
That said, who knows when it will break again and for how long 😬 So I'm going to share what I've looked into, what has worked and what hasn't so far.
The best option I've found so far (with significant caveats described later) was using the file's export link as a fallback method if calling the Google Drive #export endpoint fails with 403 - File too large to export. I'll copy out the most relevant parts below, but can provide a full PR if so desired.
Put this https://github.com/nytimes/library/blob/main/server/docs.js#L56 into a try/catch block and fall back to exporting the data via export link:
try {
const {data} = await drive.files.export({
fileId: id,
// text/html exports are not supported for slideshows
mimeType: resourceType === 'presentation' ? 'text/plain' : 'text/html'
})
return data
} catch (e) {
const errorResponse = e.response.data.error
// If the Google Drive API returns 403, we fall back to using the export link directly
if (errorResponse.code === 403 && errorResponse.message === "This file is too large to be exported.") {
console.log("falling back to using the export link...")
const manuallyFetchedData = await fetchManually(resourceType, exportLinks)
return manuallyFetchedData
} else {
throw e
}
}
Here's the function that does the manual exporting:
async function fetchManually(resourceType, exportLinks) {
const accessToken = await getAccessToken()
const exportLink = exportLinks['text/html']
const headers = {Authorization: `Bearer ${accessToken}`}
const fetchedData = await axios({
url: exportLink,
method: 'GET',
responseType: resourceType === 'presentation' ? 'text/plain' : 'text/html',
headers: headers
})
.then((response) => {
const fileContents = response.data
return fileContents
})
.catch((err) => {
console.error('Error downloading file:', err)
})
return fetchedData
}
This works locally, but there are two quite significant downsides:
It's probably possible to improve the performance on this, for example by cutting out the call to #export completely and only use the download link (if that's desirable is another question), and see if there's a reasonable way stream and chunk-process the response, for example. That would become pretty involved though, and probably needs quite a few changes in comparison to how things are done now.
One small thing we could do right away in the meantime is to add some information to the Readme, specifically
Problem Description
Not exactly a feature request, but this is the template that worked best because it's not really a bug in the features of this app itself.
TL;DR This issue is related to behavior of the
files.export
method of the Google Drive API library, whose export size limit is apparently subject to changes without notice. This has been causing failures to load documents that previously worked perfectly fine and haven't changed in the meantime.Details We've recently started having issues where we are getting 500 responses when loading certain pages/documents, caused by
files.export
(https://developers.google.com/drive/api/reference/rest/v3/files/export) returning 403 because the exported content is supposedly too large. According to the documentation, the exported content is limited to 10MB - however, our affected pages/documents did not change in size nor are they larger than 10MB when this problem started happening.When digging into this some deeper, an old issue on the Google issue tracker came up (https://issuetracker.google.com/issues/36761333) that describes a similar problem. One comment states the following:
Seeing as we didn't make any changes to our documents, it looks like the limit was in fact changed without warning, which leads to some documents being unable to load.
Feature
One way to address the problem mentioned in the thread on the Google issue tracker (see the quote above in the Problem Description section) is using the exportLinks of the file instead of the #export method. This is of course less conventient, but it doesn't seem to have a size limit. It could practically work something like this:
I have a semi-complete demo PR on our fork of this repo, which I'd be happy to share once it's done. If it works well, we'll most likely add this or a similar change to our fork, but we though it would be good if we could coordinate this upstream as well.
Another option, if these changes don't sound desirable, would be to leave things as they are, but at least mention this size limit and how it might arbitrarily change as a known limitation in the Readme, in case others run into it and experience the same problems.
Additional Information
I'd be happy to get some feedback on this, and I'm open for alternative options or approaches.