rorycl / rm2pdf

Convert reMarkable tablet notebooks and annotated PDFs to layered PDF documents
MIT License
70 stars 4 forks source link

Support for .rm version 6 files #11

Open diegotsutsumi opened 1 year ago

diegotsutsumi commented 1 year ago

Hello @rorycl,

I'm hitting the following error for a couple of files I have in my remarkable 2 (most of them work though): number of rm pages 0 != json pageCount <x> code line here

Some files are new some are old, I couldn't see a pattern still. I can't upload them here unfortunately, but I'll try to reproduce with a mock file and upload it here if I can reproduce it.

Maybe remarkable changed their file formats? Is the cause known?

rorycl commented 1 year ago

Hi @diegotsutsumi

Thanks for the bug report and sorry to hear of the problem. Yes, if you can make a non-confidential rm file bundle that shows the problem that would be great. (If you do that, please let me know if you are happy for me to add to the test suite.)

In the mean time, could you let me what <x> is for some of your files that are showing the problem? It would also be helpful if you uploaded anonymous copies of one or more <uuid>.content files related to the problem conversions. These have json fields called pageCount and originalPageCount which rm2pdf depends on. It's possible the format has recently undergone some changes.

Thanks again, Rory

Cx01N commented 1 year ago

I'm actually having the exact same problem. I was not able to export my notes on some documents and when I try to use run this tool I get the same error message.

image
rorycl commented 1 year ago

Hi @Cx01N

Thanks for the bug report.

Is there any chance you can share the files associated with the pdf 2ca7a077* ? Either post them here or you can send them to me privately at email. There is clearly an issue with rm2pdf finding the reMarkable lines files (.rm files) associated with the pdf you are trying to process. Perhaps this is due to a recent change with the reMarkable file format.

I'd expect a list of files something like this list from the project testfiles directory for the files associated with fbe9f971-03ba-4c21-a0e8-78dd921f9c4c.pdf:

testfiles
├── fbe9f971-03ba-4c21-a0e8-78dd921f9c4c.content
├── fbe9f971-03ba-4c21-a0e8-78dd921f9c4c.metadata
├── fbe9f971-03ba-4c21-a0e8-78dd921f9c4c.pagedata
├── fbe9f971-03ba-4c21-a0e8-78dd921f9c4c.pdf
├── fbe9f971-03ba-4c21-a0e8-78dd921f9c4c
│   ├── 0b8b6e65-926c-4269-9109-36fca8718c94-metadata.json
│   ├── 0b8b6e65-926c-4269-9109-36fca8718c94.rm
│   ├── e2a69ab6-5c11-42d1-8d2d-9ce6569d9fdf-metadata.json
│   ├── e2a69ab6-5c11-42d1-8d2d-9ce6569d9fdf.rm
│   ├── fa678373-8530-465d-a988-a0b158d957e4-metadata.json
│   └── fa678373-8530-465d-a988-a0b158d957e4.rm

If you can't send me the files, please send me a listing of the files on your filesystem that are associated with the pdf you are processing.

Thanks, Rory

rorycl commented 1 year ago

Thanks for the bug report and sorry to hear of the problem. Yes, if you can make a non-confidential rm file bundle that shows the problem that would be great. (If you do that, please let me know if you are happy for me to add to the test suite.)

In the mean time, could you let me what <x> is for some of your files that are showing the problem? It would also be helpful if you uploaded anonymous copies of one or more <uuid>.content files related to the problem conversions. These have json fields called pageCount and originalPageCount which rm2pdf depends on. It's possible the format has recently undergone some changes.

@diegotsutsumi : please also let me know what platform you are working on. It is possible this is a Windows issue related to my recent implementation of virtual filesystems. Thanks!

rorycl commented 1 year ago

I've verified this problem is due to a change in the .content file structure used by reMarkable in their 3.0x software release series. I'm working on a fix.

diegotsutsumi commented 1 year ago

Thanks for having a look at the problem! I'm working on a Ubuntu 20.04 here.

A more detailed log below:

processing page 0 0 inserted false template false
cffb07cb-fc26-4262-9143-aad2dc940c56.pdf rm page 1 pdf page 1
orientation portrait
no rm file for page 1 ...skipping
processing page 1 1 inserted false template false
cffb07cb-fc26-4262-9143-aad2dc940c56.pdf rm page 2 pdf page 2
orientation portrait
no rm file for page 2 ...skipping
processing page 2 2 inserted false template false
cffb07cb-fc26-4262-9143-aad2dc940c56.pdf rm page 3 pdf page 3
orientation portrait
no rm file for page 3 ...skipping
error: number of rm pages 0 != json pageCount 2

Here is the attached file .content file of the page_count_bug.content.txt

rorycl commented 1 year ago

Hi @diegotsutsumi

Thanks very much for the more detailed bug report.

Unfortunately I realise this is probably due to the version 3 reMarkable software release. Can you verify what version of reMarkable software you are running on? Since the attachment you kindly attached has the cPages attribute, I'm pretty sure you are on version 3 software.

The status of community support is set out at this reddit post Updates regarding reverse engineering ReMarkable version 3/ .rm v6 files.

Meanwhle I've done some work on support for the new v3 .content file format and I'm looking at ddvk's great work on decoding the new .rm file format. This will take a while.

In the meantime I've released a new version v0.1.6 which should stop processing earlier and report the lack of support for remarkable file bundles made with v3 software. Sorry!

diegotsutsumi commented 1 year ago

I'm running version 3 on my remarkable tablet indeed. I'm glad you found out the issue. Let me know if I can do anything to help further.

the lack of support for remarkable file bundles made with v3 software.

Do you know if I edit older files it'll be converted into v3 bundled format? I guess yes, but you might have more experience to give a better answer.

JackTheEngineer commented 1 year ago

Hi, I would be really interested in this working on V6. But .. but .. but .. i am not able to shift my priority so much on it to get working on it. So I just wanted to give you a notice that there has been some work done on https://github.com/ddvk/reader Reading the v6 format file. Maybe this would be a good starting point ? Maybe architecturally it would be possible to copy and transform ddvk's data structures, and use it in your code ? Maybe you @rorycl as the author could figure it out quickly ? :) Sorry for my blunt notice - I very much respect your time and am thankful for your contributions to the open source community. Best Regards

rorycl commented 1 year ago

Hi @JackTheEngineer

Sorry for my slow response. I'm keen to solve the parsing problem, but difficulty I have is finding a well-formatted rm file parser that I can port to Go, since i don't have binary decoding skills.

The new format is a variety of CRDT or Conflict-free replicated data type and there is a useful note on reddit about the issues to be solved which I noted in the thread above.

I've been in contact with ddvk some time ago and his parser isn't complete. It also isn't in the same rather elegant format that I ported from rm2svg which used python struct format codes.

The only complete parser of the new rm file format that I'm aware of is Rick Lupton's rmscene. Chemag has made a successful combination of rmscene with maxio's svg renderer; see here. However I have difficulty understanding the rmscene code as it is rather esoteric. If you'd like to help make a go version of rmscene, that would be great!

Cheers, Rory

Gehmasse commented 9 months ago

Hey, I just had the same problem while using your script. Are there any status updates available?

rorycl commented 8 months ago

Well, there is a new Rust v6 parser here, which I've just seen.

@Gehmasse: can you help port Lyr-7D1h's parser?

Gehmasse commented 8 months ago

Well, there is a new Rust v6 parser here, which I've just seen.

@Gehmasse: can you help decode Lyr-7D1h's parser?

Well, what do you actually mean with decoding? Unfortunately I can neither do Rust nor Go, but I could provide some example files, if this may help.

rorycl commented 8 months ago

Thanks for your offer of help. However the issue is understanding how to decode the binary CRDT format of version 6 .rm files and only rmscene in Python and remarkable-lines in Rust are available to do that, apart from ddvk's initial work in Go. I'm not able to make much sense of rmscene or the remarkable-lines code, unfortunately, so specific help with decoding is needed!

Gehmasse commented 8 months ago

Sorry, but I can't help you with this also... :/