juruen / rmapi

Go app that allows you to access your reMarkable tablet files through the Cloud API
GNU Affero General Public License v3.0
975 stars 103 forks source link

Pdf rendering of rM handwritten notes #41

Open lobre opened 5 years ago

lobre commented 5 years ago

Hello,

Thanks for starting this golang utility around the rM tablet. The rM official ecosystem is quite closed while this tablet has a great potential!

I know the aim of this project was simply to interact with the rM api to get or put files from the device's cloud.

However, I have seen you have started an implementation of pdf rendering for annotations. I started something similar for rendering regular notes a while ago and I did not go farther due to lack of time.

But as the scope of rmapi is expanding a little, maybe it could be a good idea to render as well regular handwritten notes from the rM format to pdf? As far as I understand, the format is quite the same for notes as for annotated pdf.

Having all the downloading/rendering in golang with a single binary would be rather interesting. If you agree with this, I could try to find some time to contribute and adapt the annotations implementation toward a more general one including both annotations and regular notes.

Did you notice any technical issues / conception problems that would make this task difficult to implement?

Thanks, Loric

juruen commented 5 years ago

Hi Loric!

But as the scope of rmapi is expanding a little, maybe it could be a good idea to render as well regular handwritten notes from the rM format to pdf? As far as I understand, the format is quite the same for notes as for annotated pdf.

Actually, the #38's title is a bit of a misnomer. What that PR adds is initial support to render PDF notes as a separate PDF (without the original PDF document). As you mention, the format is exactly the same as the handwritten notes, and in fact, I actually test this with notebooks and Quick Sheets.

Did you notice any technical issues / conception problems that would make this task difficult to implement?

Not really. We may need to change the PDF library in the future if we want to render annotations on top of the original PDF document.

However, in order to support handwriting notes in a better way, what we need to do next is add better support for the different pens, tilts and pressures.

Right now, all the styles (pen, tilt, pressure) are rendered in the same way. See this image:

PDF rendering example

On the left you can see a doc rendered by the original app, and on the right a doc rendered by rmapi. As you can see, all the lines look the same on the rmapi's.

What we would need to do is play around with the different styles in pdf/setStyle() which currently is as simple as:

func (p PdfGenerator) setStyle(pdf *gofpdf.Fpdf, stroke Stroke) {
    pdf.SetDrawColor(0, 0,0)
    pdf.SetLineWidth(0)
    pdf.SetLineCapStyle("round")
}

and compare the rendering results. Of course, rm2svg and @ax3l's work will be super useful to get this right quicker.

I'm not going to have much time to work on this during the next few weeks so I'd be super grateful if you wanted to give it go. I'm also cc'ing @peerdavid who might also be interested.

Thank you!

Edit: @lobre, just saw your https://github.com/lobre/rm repo Nice work! You've probably spent more time on rendering rM notes than me!

lobre commented 5 years ago

Effectively, I tried a while ago to rewrite @ax3l's work in pure Go ! I started with the parsing of the .lines format and it worked pretty well in terms of behaviour. I did not have time to finish the rendering code though.

I am just not super satisfied of the code itself. It seems too complicated to understand, too many levels of indentations and not super idiomatic. But I could try to work on it again and try to integrate a first version here following what you started with annotations.

I can see that the Remarkable .lines format has changed seeing the header you defined (rmHeader = "reMarkable .lines file, version=3). Maybe you know the differences compared to previous versions?

As this format might change again, the parsing part of the code should be rather obvious to easily adapt if necessary.

I still think it is a good idea to have a clear separation between the parsing and the rendering phases. That would allow to have a clearly defined model representing a rM note and to easily render it using different libraries and effects in the future (PDF, PNG format, or even integrating rainbow backgrounds and colourful pencils...).

I don't know much of gopdf and maybe it does not allow to write on top of an existing PDF but I guess it could still be okay for a first version!

ax3l commented 5 years ago

Cool, great work! There is also my rust prototype, lines-are-rusty ;-)

But admittedly lines-are-beautiful is the real deal, where I can develop quicker in.

Did you look into the new svg exports already so we an re-build the original renderer? The PDFs were too broken and I did not invest too much time yet. Would love to build the original lines pixel-by-pixel (started a while ago on it, but no time currently).

I can see that the Remarkable .lines format has changed seeing the header you defined

I should investigate the format changes, will do! https://github.com/ax3l/lines-are-beautiful/issues/15

I still think it is a good idea to have a clear separation between the parsing and the rendering phases.

Yes, that is also the idea I had with the file handler lib. Renderers in my case are "examples" using the file API.

write on top of an existing PDF but I guess it could still be okay for a first version!

I would even prefer this in most cases for annotations that I share with other people :)

lobre commented 5 years ago

Hi @ax3l,

Thanks for your comment and thanks for the great work of analysis on the .lines format and for these prototype you developed! It was of great help!

There is also my rust prototype, lines-are-rusty ;-)

I am sure it was a great fun to have this Remarkable device for tweaking around with rust! I have thrown a glance at your rust project but I was less comfortable with this language unfortunately.

Did you look into the new svg exports

I have had a quick look into this format. Here are what I noticed so far. Here is the general SVG format.

<svg width="1404" height="1872" viewBox="0 0 1404 1872" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"  version="1.2" baseProfile="tiny">
    <title>SVG Export Test</title>
    <image x="0" y="0" width="1404" height="1872" xlink:href="data:image/png;base64,{encoded_template_image}"/>
    <g>
        <path fill="#000000" d=""/>
    </g>
    <g>
        <path fill="none" stroke="#000000" stroke-width="4" stroke-linecap="butt" stroke-linejoin="round" d=""/>
    </g>
    ...
</svg>

The first HTML tag apart from the header is a base64 PNG encoded version of the background template of the note. This means this template stays static.

Then, there is a combination of graphic containers with each a single path.

As no specific CSS class are used, it seems the styling is only using the default browser's "user agent stylesheet". A path element represents shapes with the same styling instructions. The style is defined through the SVG default stroke properties:

As far as I can see, the number of g>path elements changes according to the brush type. Pencil type brushes seem to be all rendered from the same path element while shapes drawn using the normal pen compose multiple path elements. But this has to be confirmed.

I did not go further than this quick analysis so far. I did not try multiple layers but I guess they don't impact the rendering. As well, I did not try to export an annotated PDF to SVG but I suppose the format would be the same.

I think SVG would need way more analysis to correctly understand how it is implemented. But effectively, that could be more interesting than trying to generate a PDF version that would barely represent the original note. If we are able to generate the same SVG file, we would have exactly the same rendering which would be dope!

Maybe we can even think of generating other rendering formats by converting an initial rendered SVG. I don't really estimate the difficulty for this.

Do you also think SVG could be a "starting point" for the rendering?

I should investigate the format changes, will do!

Thanks a lot. Really impressed by your ability to decode binary formats! I have seen you talk and slides, it was of great interest.

I will try to integrate a better version of the parser into rmapi when I find some time and see if I can have more info on this SVG format ;)

ax3l commented 5 years ago

Hi @lobre, thank you for the kind words!

The rust project is just me learning rust, not really useful yet, the C++ implementation is the best reference.

Really impressed by your ability to decode binary formats.

Haha thanks, I was surprised as well. That was a lot of fun, never thought that it would lead somewhere as I never did this before and expected there be dragons.

the general SVG format

Just wonderful. Base64-encoded templates are very good, I had to export the PNGs from the binary for my local testing which was less enjoyable and of course had no labels (and contained even some hidden ones that did not make it into pencils ;-) ).

Do you also think SVG could be a "starting point" for the rendering?

Absolutely, the influence of tilt, pressure and tangent direction of two connected points in a line per pencil style would otherwise be very hard to analyze from any other of the formats (PDF is a cumbersome bloat-format and PNG is already combined and pixelated).

What I did so far: take the .lines file writer and generate a "perfectly analytic" file. (My hands are not robotic enough for perfect lines.) Example here: Datenspuren 2018 (The first talk was in English; sorry the second was for a local audience and has at least English slides.) Then modify one by one the line width, tilt, pressure, size and curvature per pencil style and see the changes in width, style and orientation of the template patches in the line of the svg. By that we find the rendering algorithm used per pencil.

Not that not all pencils styles (and their algorithms) will depend on e.g. tilt. Depends e.g. if you emulate a ballpoint pen (no tilt dependence), a paint brush (definitely tilt dependent), or a fountain pen with fancy flat fount (nibs for calligraphy, they should have added such as well <3). We already documented that by now from the official bog entries of reMarkable, thanks @matteodelabre for that.

As an orientation, take a look how the gimp pencil paintbrush tool works when combining its patches (and its options). My guess is that this is probably the same principle.

lobre commented 5 years ago

What I did so far: take the .lines file writer and generate a "perfectly analytic" file. Then modify one by one the line width, tilt, pressure, size and curvature per pencil style and see the changes in width, style and orientation of the template patches in the line of the svg. By that we find the rendering algorithm used per pencil.

This methodology of analysis seems good to me! And the documentation table you are linking from your .hpp file will already be of great help. Just a question, when you followed that process, were you already working with the new svg format? Or are you refering to the analysis using the binary .lines format?

Base64-encoded templates are very good.

If I remember well, the png background templates were stored externally on the remarkable device and only referenced by their name in the note's .zip metadata. Did you use them in some way with your renderers? Or did you omit them? Because effectively, here with this new svg format, we need as well to retrieve them, encode them into base64 and include them in the svg.

I had to export the PNGs from the binary for my local testing which was less enjoyable and of course had no labels (and contained even some hidden ones that did not make it into pencils ;-) ).

Not sure to understand which process you are referring to. I don't recall seeing any png contained in the binary. Did I miss something?

As an orientation, take a look how the gimp pencil paintbrush.

I will have a look, if you are right, that would be a really convenient way for speeding the analysis.

And for sure, I'll do my best to decode your german spoken video, I didn't realize you had this second one recorded!

ax3l commented 5 years ago

when you followed that process, were you already working with the new svg format?

No, that was shortly before the SVG update was released and I did not go much further since then due to limited time, unfortunately. That was my start to try to analyze the PDF and PNG renderer, but now we have a much better starting point with the SVG since we actually see the placements of each patch/dot along the line depending on .lines values.

the png background templates were stored externally

Didn't know that, looked to me as if they were just embedded in the main xochitl binary but I might have missed the external location. Up to now, the example renderers I build on top of the file API are just artistic own renderers with no relation to the original methods.

Because effectively, here with this new svg format, we need as well to retrieve them, encode them into base64 ...

What I would do to avoid issues, because you might be touching copyright territory here with the original pngs/templates, is the following: a) just create a small sample svg drawing that contains all pencil templates and share that and read from it. It will naturally contain all the basic png/base64 patches you need. Or b) when you have the original templates just roll your own that are very close in pixel structure and final impression, but created new from scratch. That said: I am not a lawyer.

Not sure to understand which process you are referring to. I don't recall seeing any png contained in the binary. Did I miss something?

I was trying to see if I can understand the renderer from the final composed image in PNG and analytical .lines input. Not the track we should follow, it's inefficient and SVGs provide much more information.

And for sure, I'll do my best to decode your german spoken video

Sorry again for that, it's the slides that count :) Any other questions in the session were mainly about my impression of the openness of the company, on which I can only speculate. Feel free to ask any further questions.

lobre commented 5 years ago

Okay cool, so to sum up!

1. SVG analysis for understanding the exact rendering process

This new SVG beta export introduced by the latest upgrade from Remarkable can truly be an opportunity to finally implement a renderer that would have the exact same shape as Remarkable's exports. We need to analyse using different examples and understand the impacts of each pencil style on the exported SVG. Thereafter, we should easily be able to implement a realistic renderer.

2. Parsing the .lines format in Go

I think we need a solid encoder/decoder from the binary .lines format to a Go model. It has to be obvious to be maintenable because Remarkable may change the format again in the future.

Gladly, @ax3l has already done a excellent job of documenting the binary format here. He also has an implemented version of the decoder in C++ here.

@ax3l may find some time to get the grasp on the 3rd .lines format.

3. Encoding a modeled version into a .lines binary

A reversed process to encode a Go model representing a note into a valid .lines and .zip file would definitely help as well. We could imagine a "external note cmd/GUI/editor" that could generate valid Remarkable notes pushable on the device using rmapi. That could help to generate notes with "perfect line shapes" for instance, and see the impact on the exported SVG by the device.

4. Implementation of the SVG renderer from the Go model

When we will have decrypted/understood entirely the SVG format from Remarkable, we will be able to create our own SVG renderer from a Go model representing a note.

5. Conversion of SVG into other formats

The final SVG note may then converted to other formats such as PDF or PNG... (External Go packages/libraries?).

I pretty much have the same thoughts as @juruen about the fact we need a full Go implementation for the cloud api, the parser and the renderer.

Having said that, we need a full time team of developer to get going :-D.

More seriously, (even if I am also quite busy these days) I will try to find some time to have a look at the step 2 as I started this task on a separate repo here.

ax3l commented 5 years ago

Uh, I just remembered: we could also just expose Go interfaces from the C++ API via FFI :-) But it's fun to implement the reader in different languages as well, so don't hold back.

ax3l commented 5 years ago

I just updated lines-are-beautiful (C++) and lines-are-rusty (Rust) for v3 .rm file support (comes since official client was updated to v1.6).

lobre commented 5 years ago

Thanks for the message! I am not far from creating a pull request here as well.

rorycl commented 4 years ago

Hi. I've written a converter in go called rm2pdf for rm files overlaying PDFs or simply notes as a holiday test project.

I'm using a reMarkable running v2.0.2.0 software creating line v5 version .rm files.

It includes an .rm parser based on rm2svg and strokes paths using fpdf. It creates multi-layer PDFs.

You can grab my work at http://campbell-lange.net/media/files/rm2pdf.tgz (sha1sum 714a8e2452b47326c84a3f20af0e6aa544fabec3) or view the sources at http://campbell-lange.net/media/files/rm2pdf/. Apologies for the beginner go code.

Run the tests to get some example output, or read doc.go, or run rm2pdf -h.

If it is useful I'm happy to stick the source on github after cleaning it up a bit. I'll also have to sort out the package paths if I publish it.

Here is a PDF from an example note made on a reMarkable, with the layer colour forced to blue: toolbox.pdf

juruen commented 4 years ago

👋 @rorycl, support to render PDF annotations is something we are missing and I'd definitely like to see.

If it is useful I'm happy to stick the source on github after cleaning it up a bit. I'll also have to sort out the package paths if I publish it.

So, yeah +1000 to do ⬆️

rorycl commented 4 years ago

Hi @juruen. I've put the project at https://github.com/rorycl/rm2pdf.

juruen commented 4 years ago

👋 @rorycl

Thank you so much for uploading the code.

I spent some time playing with it and with the import library and as you know, there are some issues with some PDFs. Actually, I had issues with most of my PDFs :(

That's why I decided to explore the path of using a different PDF library and the results have been great so far. You can see the current PR in https://github.com/juruen/rmapi/pull/85

Thank you!

rorycl commented 4 years ago

Hi @juruen

I'll contact the author of the import library (gofpdi) to see if the import issue can be addressed. Using pdftk to rewrite the pdfs makes them always work, so it may be a simple xref PDF problem.

By the way, did you try the rmparser component of rm2pdf? I updated the docs earlier. See https://godoc.org/github.com/rorycl/rm2pdf.

rm2pdf seems to work fine for handwritten notes in notebooks, using the A4 template, in my testing.

rorycl commented 4 years ago

Meanwhile, @lobre, I believe rm2pdf can be used reliably, following the title of this issue, for rendering handwritten or drawn notes to PDFs.

In the case of creating PDFs from a source PDF with annotations, we have found a problem as noted by @juruen. That is due, I believe, to gofpdf #16 "Failed to initialize parser: Failed to read pdf: Failed to read xref table: Expected xref to start with 'xref' #". I am looking into alternatives.

I haven't had any reports of issues with the rmparser component of rm2pdf. Although the error handling is not good, it has been shown to be reliable in the several hundred pages of notes I have made in the last month.

juruen commented 4 years ago

Just a quick note, I just merged #85 which introduces PDF annotations using UniPDF.

We can now generate overlayed annotations on top of existing PDF documents. There's still a bit of work to use different styles and pen sizes but we finally can render handwritten notes on top of PDF docs 🎉

phpdave11 commented 4 years ago

@rorycl gofpdi v1.0.9 now supports xref streams

f3fora commented 3 years ago

In pull request #138 there's an attempt to improve the render of annotation.
Just a few clarifications:

rorycl commented 3 years ago

@f3fora please check out https://github.com/rorycl/rm2pdf

The rmparse component and rmpdf/stroke.go could be turned into a library for abstractly describing strokes which could be written to pdf, svg, png etc using a configuration file as you suggest.

Apart from these needing to be separate modules, there are unfortunately a number of issues with the gofpdi library not reliably importing pdf pages. An abstract library could allow people to use the underlying pdf library they wish to use (open-source or otherwise).