obsidianmd / obsidian-importer

Obsidian Importer lets you import notes from other apps and file formats into your Obsidian vault.
https://help.obsidian.md/import
MIT License
619 stars 53 forks source link

[OneNote] - HTTP 429 rate limitingduring importing #225

Open kquinsland opened 3 months ago

kquinsland commented 3 months ago

I am trying to import a particular OneNote notebook and am unable to do so w/o hitting rate-limit errors.

In the user interface, the rate-limit errors are manifesting themselves like so:

Failed: "Exported image 20240329083022-5.jpeg" because TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer...
Failed: "Exported image 20240329083024-2.jpeg" because TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer...
Failed: "Exported image 20240329083024-3.png" because TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer...
Failed: "Exported image 20240329083024-2.jpeg" because TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer...
Failed: "Exported image 20240329083024-3.jpeg" because TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer...
Failed: "2019-2-21" because TypeError: Cannot read properties of undefined (reading 'split')
Failed: "Exported image 20240329083025-0.jpeg" because TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer...
Failed: "2019-2-22" because TypeError: Cannot read properties of undefined (reading 'split')
Failed: "2019-2-23" because TypeError: Cannot read properties of undefined (reading 'split')
Failed: "2019-2-24" because TypeError: Cannot read properties of undefined (reading 'split')
Failed: "2019-2-25" because TypeError: Cannot read properties of undefined (reading 'split')
Failed: "2019-2-26" because TypeError: Cannot read properties of undefined (reading 'split')
Failed: "Exported image 20240329083024-5.jpeg" because TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer...
Failed: "2019-2-28" because TypeError: Cannot read properties of undefined (reading 'split')
Failed: "Exported image 20240329083027-0.jpeg" because TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer...
Failed: "Exported image 20240329083027-1.jpeg" because TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer...
Failed: "Exported image 20240329083027-2.jpeg" because TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer...

But in the dev-tools console, the underlying 429 shows up:

(Not sure why I can't copy stack trace)

Screenshot 2024-03-29 083749


Other details:

kepano commented 3 months ago

@p3rid0t do you have an idea of what might be going on? Potentially related to #120

p3rid0t commented 3 months ago

This is weird... 🤔

The function detected the error (noticeable by the message 'An error has occurred while fetching [...]', but the if statement that's right after it which checks for error code 20166, is not executed. I'll try to follow up soon

kquinsland commented 3 months ago

The function detected the error (noticeable by the message 'An error has occurred while fetching [...]', but the if statement that's right after it which checks for error code 20166, is not executed. I'll try to follow up soon

Take a closer look at the object in the console. I hot-patched it:

                    s.error.code === "20166") {
                        let l = +!o.headers.get("Retry-After") * 1e3 || 15e3;
                        if (console.log(`Rate limit exceeded, waiting for: ${l} ms`),
                        r < a1)
                            return await new Promise(c=>setTimeout(c, l)),
                            this.fetchResource(e, i, r + 1);
                        throw new Error("Exceeded maximum retry attempts")
                    }

I'm not a JS dev nor do I have much time to set up an env so I'm just patching the minified file in .obsidian/plugins... and reloading obsidian

After the patch, console looks promising:

(why the hell can't I right click, copy in the obsidian dev tools console!?... i can do that on ff/chrome dev tools console :( )

image

kquinsland commented 3 months ago

Hmm. I was a bit too optimistic after the change and managed to get my self put back in the cool-down bucket.

I can't even list notebooks now (this is the same behavior I was getting last night when trying to import everything...)

image

And here's the kicker:

image

Note that there is no Retry-After header sent by microsoft?

And if I'm reading this right:

let l = +!o.headers.get("Retry-After") * 1e3 || 15e3;

Missing retry-after header will mean 1000 ms because you're negating what will probably be null or undefined or whatever .get() does when the header isn't present?

The docs suggest that the Retry-After header won't be present in at least some cases:

The resources listed earlier do not return a Retry-After header on 429 Too Many Requests responses.

So the fix here is to bump up the number of max retry attempts and to exponentially wait longer when retry-after is not present?

kquinsland commented 3 months ago

I figured out how to get the un-minified JS coded into the plugin dir so I could start hacking on it, came up with this:

if (s.error.code === "20166") {
    let retryAfter = o.headers.has("Retry-After") ? +o.headers.get("Retry-After") * 1000 : null;

    if (!retryAfter) {
        console.warn("Rate limit exceeded. Retry-After header not present.");
    }

    // Exponential backoff
    let exponentialBackoff = Math.pow(2, r) * 1000; // Exponential backoff in milliseconds

    let waitTime = retryAfter || exponentialBackoff;

    console.log(`Rate limit exceeded, waiting for: ${waitTime} ms`);

    if (r < a1) {
        await new Promise(resolve => setTimeout(resolve, waitTime));
        return await this.fetchResource(e, i, r + 1);
    } else {
        throw new Error("Exceeded maximum retry attempts");
    }
}

but it looks like I need to spend more time cooling down as I can't even finish logging in / enumerating stuff to fetch:

image

kquinsland commented 3 months ago

After cooling down a bit, I was able to complete the log in (side note: is there an open issue to improve that functionality so auth can persist?) and found that I never got a retry-after header... which makes sense in light of the docs linked above.

Given that the header in question likely won't be present for the majority of requests / cases, I'm at a bit of a loss as to how to "fix" this:

I've not looked at the API too closely but it seems like the correct way would be to get a full section/sub-section at a time instead of one request for each page in a given section/sub-section? Assuming there's a reliable delimiter for each "page" that would get returned in that one request, it should be easy-ish to just split the payload on that delim and feed it in to the current parse/extract/save code?

Alternatively, persist auth details so it's (slightly) less of a PITA to launch the import flow and select a sub-section at a time?

Not clear to me if the plugin looks to see if a file exists on disk or not before making the request for a given page. E.g: let's say that I select sub sections "08" and "09" from the "2005" section for a particular notebook and I start hitting rate limits on 2005/09/20.md. I'll power down computer for a while to let the request buckets empty before trying again. When finally able to start again, I'll end up burning some of the precious requests on files that already exist?

RazgovovEvgen22 commented 3 days ago

Not clear to me if the plugin looks to see if a file exists on disk or not before making the request for a given page.

I haven’t looked at the code, but when I resynchronize, duplicate files are created with an additional number at the end of the name