rmraya / OpenXLIFF

An open source set of Java filters for creating, merging and validating XLIFF 1.2, 2.0 and 2.1 files.
https://www.maxprograms.com/products/openxliff.html
Eclipse Public License 1.0
65 stars 17 forks source link

Segmentation rules are being ignored. #32

Closed MartinAlacam closed 2 months ago

MartinAlacam commented 2 months ago

I don't know when it exactly started but from at least version 3.17 the segmentation rules are being ignored. No matter if default.srx (by not specifying any) or with -srx my.srx, they are not applied. Updated to 3.20, the same. I tried with v 3.6 which I had on my disk, that is working fine.

rmraya commented 2 months ago

Provide a test case that shows the problem.

MartinAlacam commented 2 months ago

On: <tool tool-version="3.6.0 20230413_0951" tool-id="OpenXLIFF" tool-name="OpenXLIFF Filters"/> it is: <source>Dohoda č. 5 yada yad yada.</source> Whereas on: <tool tool-version="3.17.0 20240106_0938" tool-id="OpenXLIFF" tool-name="OpenXLIFF Filters"/> it creates 2 tus: ` Dohoda č.

5 yada yad yada.

`

rmraya commented 2 months ago

Provide sample files and the command line to use.

MartinAlacam commented 2 months ago

tests.zip ~/Applications/OpenXLIFF_36/convert.sh -file $PWD/test.docx -srcLang cs -tgtLang tr and the same with 3.17 and also tried the -srx flag and custom srx file.

rmraya commented 2 months ago

test.docx is not inclded in the zip. There is no custom .srx either.

MartinAlacam commented 2 months ago

tests.zip These tests are done without the custom srx, but I'm sending it anyway, it's almost the same like default.srx, just a couple of additions in Turkish.

rmraya commented 2 months ago

There is a difference on how Java 21 and Java 8 handle \b in regular expressions. This SRX rule from default.srx works as expected with Java 8:

        <rule break="no">
          <beforebreak>\b[čČ]\.</beforebreak>
          <afterbreak>\s</afterbreak>
        </rule>

On Java 21 the \b doesn't match the text on your file. It works if \b is replaced with \s.

MartinAlacam commented 2 months ago

OK, but these tests were done with the default file not my custom one.

rmraya commented 2 months ago

It does not matter.

MartinAlacam commented 2 months ago

I don't understand. The default.srx is the same: diff OpenXLIFF_36/srx/default.srx OpenXLIFF_3_17/srx/default.srx yields nothing. I use the default.srx, 3.6 segments correctly, 3.17 doesn't. Should I use a different version of java?

rmraya commented 2 months ago

default.srx has been adjusted to work with Java 21

MartinAlacam commented 2 months ago

Thank you Rodolfo, everything works and I'm back on the current version.