julianhille / MuhammaraJS

Muhammara a node module with c/cpp bindings to modify PDF with js for node or electron (based/replacement on/of galkhana/hummusjs)
Other
205 stars 43 forks source link

RTL Support (Arabic, Hebrew...) #330

Open thmclellan opened 9 months ago

thmclellan commented 9 months ago

Thanks for the excellent work in maintaining and improving this library!

I wondered if adding support for Right-To-Left languages including Arabic and Hebrew is on your roadmap or an area where you'd consider a PR.

Related links:

Reading into issue 56 above, it sounds like one key consideration was which library to use (and related licensing). I wondered if you had any preferences and/or suggested approach. I'm in fact finding mode and just trying to get an idea of the right approach and magnitude before experimenting with a fork or PR. Thanks

julianhille commented 9 months ago

I'd start with the html to pdf part to test if that considers RTL as I have no idea if it takes that into account.

thmclellan commented 9 months ago

Thanks, I don't think hummus-recipe ever added RTL support (https://github.com/chunyenHuang/hummusRecipe/issues/108) but it's worth a try, maybe it gets handled through a lower-level library.... I'll give it a try.

thmclellan commented 9 months ago

I experimented with the ModifyExistingPageContent.js test to see if I could add some Hebrew text onto an existing PDF with the writeText() function. I used an arial-unicode font that included Hebrew characters.

The characters came out in reverse order when rendered on the PDF. (Copy of the test below with screenshot).

As a next step I'm goingto experiment with using a BIDI library to see if the text can be modified before passing to writeText(), possibly https://www.npmjs.com/package/bidi-js. I'll keep you posted on how it goes or if you have a preferred approach let me know. Thanks

describe('ModifyExistingPageContent', function () {
  it('should complete without error', function () {
    var pdfWriter = muhammara.createWriterToModify(__dirname + '/TestMaterials/BasicJPGImagesTest.PDF', {
      modifiedFilePath: __dirname + '/output/BasicJPGImagesTestPageModified2.pdf',
    });

    var pageModifier = new muhammara.PDFPageModifier(pdfWriter, 0);
    // const content = `The quick brown fox jumped over the lazy dog`;
    const content = `השועל החום המהיר קפץ מעל הכלב העצלן`;
    pageModifier
      .startContext()
      .getContext()
      .writeText(content, 75, 805, {
        font: pdfWriter.getFontForFile(__dirname + '/TestMaterials/fonts/arial-unicode.ttf'),
        size: 18,
        colorspace: 'gray',
        color: 0x00,
      });

    pageModifier.endContext().writePage();
    pdfWriter.end();
  });
});

Rendered on PDF in reverse:

image
julianhille commented 9 months ago

Would you mind trying a freetype font having a draw direction? I guess it would work.

thmclellan commented 9 months ago

Thanks but I didn't quite understand. Are you suggesting to render the text using a specialized font that assumes we're writing in LTR mode and have reversed the characters of the string before rendering?

That would be nice to handle it at the font / text level. It doesn't seem like PDF spec has any inherent support for RTL writing. The writeText function (using Tm() and Tj()) doesn't support a draw direction, so I guess the X coordinate would be the end / left-most position of the text.

The previous discussion (https://github.com/galkahana/HummusJS/issues/56#issuecomment-166118281) looks at using Arabic Presentation Forms B and a mapping script like https://github.com/NaurozAhmad/Arabic-Urdu-Converter-From-and-To-Presentation-Forms-B.

For Hebrew I wonder if it would work to just reverse the string characters and use the X position as the ending point for the text.

julianhille commented 9 months ago

Freetype fonts might have a draw direction as far as I understood the c code for freetype fonts. I wondered if maybe the libs do use this direction for drawing. That's why I suggested to try a specific rtl only freetype font.

thmclellan commented 9 months ago

Okay interesting, I looked for RTL only freetype-compatible fonts but didn't find any clear options.

Maybe best for now just to assume that RTL handling should be done before calling MuhammaraJS writeText(). My main use case is Hebrew, which we can probably handle by reversing the text and using a unicode font. It seems like there's an extra level of effort to render Arabic in PDF.

Feel free to close this issue... thanks again for the help in troubleshooting!

julianhille commented 9 months ago

Im not really into rtl are there cases where reversing does not work?

thmclellan commented 9 months ago

I guess in Arabic the reversing method can cause issues where the characters aren't joining together (like cursive writing). It sounds like the reversal approach is generally okay for Hebrew. Some more detail at https://github.com/galkahana/HummusJS/issues/56#issuecomment-165030284.

Just reading through past PR efforts on this with PDF-Writer and it looks like Gal outlined a special build approach (https://github.com/galkahana/PDF-Writer/pull/65#issuecomment-705644765) for RTL, presumably due to licensing issues.