rmbackup - 408 request timeout when downloading pdf

julienma commented 2 months ago

I set up rmbackup to create .rmdoc and download .pdf files. When doing a backup (total of about 290 notebooks), all of them are available as .rmdoc, but only ~120 of them have been downloaded as pdf.

The log shows:

ERROR: unable to download http://10.11.99.1/download/[uuid]/placeholder
408 request timeout
Cleaning up

Tablet is on 3.11.2.5, plugged in via USB, SSH key installed, web interface enabled and accessible via http://10.11.99.1 (even though it sometimes shows an error "Unable to list documents: The string did not match the expected pattern", but refreshing the page works).

kg4zow commented 2 months ago

How much time passes between Writing /Users/......pdf and the red ERROR: line? Is the tablet immediately returning the 408, or is it just sitting there for 30+ seconds?

Is this a case where the script works okay until a certain point, and then after that every PDF transfer mails? Or is it only failing for certain files, like maybe large files, but working correctly for smaller files?

How big are the files it's having trouble downloading? What size does the web interface show for them?

Also, it looks like the built-in web interface is having problems. That "string did not match the expected pattern" message is ... really strange. It might be worth contacting reMarkable about that part of it. I'm going to guess their response will be to upgrade to the latest version of the tablet software - and to be honest, that would also be my first suggestion as well. (Although I deliberately stopped upgrading my tablets at version 3.10, because I didn't want to lose the ability to downgrade the software later on, and because I haven't had time to figure out how the new upgrade mechanism works yet.)

julienma commented 1 month ago

I tried again with the same files, but after having updated to 3.14. Here's what happened:

a bunch of files passed successfully (rmdoc + pdf)
then the script got stuck multiple minutes (3-5 min?) on a single file (original file is a PDF of ~15MB), and fails with 500 read timeout. This file happens to be archived in the cloud (it's listed in the directory, with a small cloud icon, but I think not physically on the device, cf. https://remarkable.com/blog/remarkable-software-version-310)
then every file after fails with 408 request timeout

What's interesting is that when I try to download as PDF this same file from the web interface, it fails silently (no error in UI nor in the console). Then if I reload the web interface, I get an error "408 request timeout".

(Although I deliberately stopped upgrading my tablets at version 3.10, because I didn't want to lose the ability to downgrade the software later on, and because I haven't had time to figure out how the new upgrade mechanism works yet.)

I did just that, following these directions: https://github.com/Jayy001/codexctl/issues/95#issuecomment-2305529048. Now I have a partition with 3.14, and the other one with 3.3 / toltec. Seems to work fine.

julienma commented 1 month ago

I sync'd the file that was "cloud-only" to the device, via the tablet. I ran the script again, and it worked perfectly for that file. The script then failed with another file which is also cloud-only.

Seems like an issue of the web interface with cloud-only files.

kg4zow commented 1 month ago

It sounds like the problem is that, because the documents are "archived in the cloud", their content isn't actually IN your tablet.

Other guesses, based on this (and please tell me if I'm right about thse) ...

The build-in web interface doesn't offer any "download" options for these documents.
The web interface might offer options to download a "cloud only" document to the tablet, or to flag a document as "archive in the cloud", but I wouldn't count on it.

The script works by using rsync (which uses SSH) to copy whatever files are in the tablet, to the computer. This means it can only "see" what's actually IN the tablet.

If a document is "archived in the cloud", I'm guessing it means that some metadata is on the tablet, but not the files which contain its contents. The rsync process only copies what is there, so in addition to not being able to download PDFs, I suspect you will find that any .rmdoc or .rmn files the script may have written, wouldn't actually "work" if you were to upload them back to a tablet.

I'd like to figure out how to recognize when a document is "cloud-only", so I can make the script (1) not create .rmdoc or .rmn files (which, again, I suspect wouldn't work if you restored them), (2) not try to download a PDF (which, as you've seen, isn't going to work), and (3) "fail gracefully", which in this case means, show a more informative error message.

Part of the problem with this is, I have no way to duplicate what you're seeing. I've never used the reMarkable cloud, mostly for privacy reasons. However, because the script writes .rmdoc and .rmn files before trying to download PDF files, the .rmdoc files that the script downloaded before failing to download the PDF can tell me what the tablet actually contains for a cloud-only document. I suspect this will be just one or two metadata files but no content, but I'd like to verify this before updating the script.

Is there a way you can send me some of these files, so I can figure out what they contain (and therefore what the tablet contains) for these "cloud-only" documents? I'll be able to use this information, not only to fix this script, but possibly to also fix rmweb (a Golang program which downloads PDF and .rmdoc files using only the web interface, which means it should provide a way to locally back up documents from rMPP tablets that aren't in "developer mode").

My preference would be to use Keybase - if you're also using Keybase, just save the files in in /keybase/private/yourname,jms1/ and I'll be able to see them there. If that's not an option, you can create a dummy Github repo, commit/push the files there, and reply here with a link to that repo.

Thanks again for telling me about this, the truth is that because I don't use the cloud service, it has never been a factor in writing my programs - and that needs to change.

julienma commented 1 month ago

Good news: I sync'd on my device all the docs that were cloud-only. The backup script then worked flawlessly.

To try and answer your questions, I created a new folder "Test", and 2 notebooks. I then archived one of them from the tablet's GUI.

The build-in web interface doesn't offer any "download" options for these documents.

It does offer the download options—however it fails downloading, cf. below. There's no indication in the web interface that this is an archived / cloud-only doc.

Here's the output of the request when entering the folder (GET http://10.11.99.1/documents/2ac7b2fb-c403-4ebc-8685-2be645a3ce15). No indication that one of them is archived.

[
    {
        "Bookmarked": false,
        "CurrentPage": 0,
        "ID": "b17fa039-082c-4274-9338-fd998018d4c8",
        "ModifiedClient": "2024-09-12T12:59:49.618Z",
        "Parent": "2ac7b2fb-c403-4ebc-8685-2be645a3ce15",
        "Type": "DocumentType",
        "VissibleName": "Cloud only test",
        "fileType": "notebook"
    },
    {
        "Bookmarked": false,
        "CurrentPage": 0,
        "ID": "4d976cc0-6119-4e3d-94e0-425c9e42c6c5",
        "ModifiedClient": "2024-09-12T12:59:11.174Z",
        "Parent": "2ac7b2fb-c403-4ebc-8685-2be645a3ce15",
        "Type": "DocumentType",
        "VissibleName": "Local test",
        "fileType": "notebook"
    }
]

The web interface might offer options to download a "cloud only" document to the tablet, or to flag a document as "archive in the cloud", but I wouldn't count on it.

Nop, nothing like that.

Downloading the LOCAL / SYNC'D notebook

Downloading as RMDOC triggers this request:

GET http://10.11.99.1/download/4d976cc0-6119-4e3d-94e0-425c9e42c6c5/rmdoc

Response headers:

HTTP/1.1 200 
Content-Disposition: attachment; filename="Local test.rmdoc"
Content-Length: 8350
Content-Type: application/zip
Transfer-Encoding: chunked

Downloading as PDF triggers this request:

GET http://10.11.99.1/download/4d976cc0-6119-4e3d-94e0-425c9e42c6c5/pdf

Response headers:

HTTP/1.1 200 
Content-Disposition: attachment; filename="Local test.pdf"
Content-Length: 11557
Content-Type: application/pdf
Transfer-Encoding: chunked

Both files downloaded from the web interface are in keybase. I didn't try to sync again with rmbackup, as it's quite long and I think that we have a conclusion—see end of reply.

Downloading the CLOUD / ARCHIVED notebook

Downloading as RMDOC triggers this request:

GET http://10.11.99.1/download/b17fa039-082c-4274-9338-fd998018d4c8/rmdoc
The UI shows this screen for ~30 seconds, then it disappears. No file is downloaded.
The log shows no response. The request fails with NS_BINDING_ABORTED (I'm using Firefox devtools, if it matters).
Subsequent attempts to navigate in the web interface fails with the same error (in this screenshot, we see the 2 successful downloads for the other notebook, then the failure for the cloud notebook, then I tried to open another folder and get back to "My Files", which both failed):
I have to disable and re-enable the USB web interface from tablet's settings, to access the web interface again.

Same behavior for the Download as PDF option.

Looking at raw files

Now there's something interesting when connecting via SSH and looking at the raw files in ~/.local/share/remarkable/xochitl/.

This is the LOCAL notebook. Business as usual:

This is the CLOUD notebook. Look at the new file!

Indeed, as soon as I archive another file from the tablet's UI, a new .cloudarchive file appears.

All .cloudarchive files seem to be identical:

{}

so they could probably be used as a flag to know if file is sync'd locally or not?

kg4zow commented 1 month ago

The presence of the UUID.cloudarchive file is exactly what I was hoping for - either that, or a new attribute within the UUID.metadata file, but this is even easier. What I'm thinking is, somewhere around here, add a check to see if a UUID.cloudarchive file exists. If so, print a message saying something like "document is archived in the cloud, cannot download" and move on to the next document. That way it won't even try to download a PDF, or save a useless .rmdoc or .rmn file.

Is it safe to assume that there's also a 4d976cc0-xxx directory with .rm files in it, and that no b17fa039-xxx directory exists?

Also, you said you stored some files in Keybase - where specifically? I don't see any new /keybase/private/jms1,___/ directories - maybe send me a Keybase chat (user jms1) to let me know where to find it?

Finally, I want to thank you for working with me to troubleshoot this. It's rare to see an end-user who's both willing and able to actively help me troubleshoot an issue. (I wish the users at $DAYJOB were like you in this respect.)

julienma commented 1 month ago

Is it safe to assume that there's also a 4d976cc0-xxx directory with .rm files in it, and that no b17fa039-xxx directory exists?

Exact:

Files are actually in /keybase/private/julienma,jms1/. I thought you'd be able to see them?

👍 :)

kg4zow commented 1 month ago

Okay, so the b17fa039-xxx/ directory exists but is empty. Simple enough.

As for Keybase ... once I knew the name of the directory to cd into, I was able to cd into it ... it just didn't show up in a directory listing of /keybase/private/. Not sure how the Keybase client caches those directory entries, and if that's the worst problem I run into all day, I'm doing okay.

In the directory I see Local test.pdf and Local test.rmdoc, but no "cloud test" files.

julienma commented 1 month ago

but no "cloud test" files

That's because there's no file: the download failed for both formats, cf. previous comment.

kg4zow commented 1 month ago

I just pushed a new version of the script. It checks for the UUID.cloudarchive file while examining the files that rsync found, and sets a flag for that UUID. Later it checks that flag and bypasses any creating or downloading of .rmdoc, .rmn, or .pdf files.

This means that if a file was backed up with this script and later "archived to the cloud", the last good local files will still exist in your backup directory.

I did a quick test by manually creating a UUID.cloudarchive file after rsync finished, please let me know how it works with real cloudarchive files.

kg4zow commented 1 month ago

but no "cloud test" files

That's because there's no file: the download failed for both formats, cf. previous comment.

This script doesn't download .rmdoc files, it creates them from the raw files that rsync downloaded from the tablet.

It should have created an .rmdoc file for the document whose PDF download failed, although if the UUID/*.rm files, UUID.pdf, UUID.epub, etc. aren't on your tablet, they wouldn't be included in the .rmdoc file, which means that any .rmdoc files would be "broken" and wouldn't work if uploaded to a tablet.

You may want to check your .rmdoc files to be sure they have usable content. unzip -l xxx.rmdoc will show you the contents of the file.

(Also, I didn't realize that Github allows the owner of a normal repo to edit other peoples' comments. I'm used to editing my own comments for $DAYJOB, and I chose "Edit" from the pop-up menu rather than "Quote reply" out of muscle memory. Sorry about that, I think your comment above is the way you originally wrote it...)

julienma commented 1 month ago

Hey, thank you for the update!

I cloud-archived a bunch of docs, and ran the new script. It works perfectly :)

A suggestion: I think it's quite important for the user to be aware of which files were not backed up, so that they can choose to sync the files and resync if required. What'd be great is to either output a summary of all files that were skipped, or simply to use another color than blue to highlight them (yellow?).

PS: also, sometimes there's this line in the log, but it doesn't affect the completion of the script:

Wide character in printf at [...]/rmbackup line 145.

kg4zow commented 1 month ago

The alternate colour is a good idea, however I'm already using yellow for debug messages. I went with cyan instead.

The "wide character" message is probably because you have a non-ASCII character in a document's visible name. It took some trickery, but I was able to manually rename a document to have Korean characters in the name by editing the UUID.metadata file and entering the Unicode code points as ...

    "visibleName": "\uC548\uC601 \uD558\uC138\uC694!"

Luckily, the macOS "character picker" makes it easy to find the right code points.

I just pushed an 0.0.5 2024-09-16 version to the repo, let me know what you think.

julienma commented 1 month ago

It's great, thank you!

kg4zow / rm2-scripts