teras / Jubler

Jubler Subtitle Εditor
http://www.jubler.org/
GNU General Public License v2.0
145 stars 17 forks source link

Replace with regular expressions on MacOS #39

Open nachocho opened 5 months ago

nachocho commented 5 months ago

Hi, I am not sure if what I am observing is a bug, a limitation of the regular expressions replace feature, or if I am missing something when trying to use the feature.

2 things that do not work for me:

1. Trying to set the style of subtitles using a regex. (This I think may be a limitation)

Take as an example the following subtitle entry (as plain text):

270
00:14:41,520 --> 00:14:44,840
<Soy Jeremy. Jeremy Parks. 
<El tonto que te chocó de atrás. 

Where some lines start with "<" for some weird reason, but by looking at the context in the video those lines should be in italics. So I want to use a regular expression for replacing such as:

Original value: "^<" New value: "\<i>"

But the regular expression just inserts the text "\<i>" as part of the subtitle instead of changing the style to italics.

2. Trying to use the new line character "\n" in the New value field. (This one looks like a bug to me)

Take as an example the following subtitle entry (in plain text):

553
00:28:15,020 --> 00:28:17,300
así que, no podemos ver su cara.
- ¿Y Corinne?

Where the entry is about a dialog between two people, so I want to fix it by checking if any entry has a hyphen next to a new line character but no hyphen at the start of the text, and inserting the hyphen at the start of the text so to get this result:

553
00:28:15,020 --> 00:28:17,300
- así que, no podemos ver su cara.
- ¿Y Corinne?

The regular expression values for the replace I am using are:

Original value: "^ ([^-]+)\n- ([^-]+)$" New value: "- $1\n- $2"

Which gives me the following result (also in plain text):

553
00:28:15,020 --> 00:28:17,300
- así que, no podemos ver su cara.n- ¿Y Corinne?

Meaning that for some reason instead of inserting a new line character "\n" it is inserting just a "n" with no backslash.

If I test both the original text and the original value and new value in regular expressions 101, it gives me the result I expect in Jubler, which is:

553
00:28:15,020 --> 00:28:17,300
- así que, no podemos ver su cara.
- ¿Y Corinne?

Am I missing something in the values I am using for the replace?

Thanks!

teras commented 5 months ago

But the regular expression just inserts the text <i> as part of the subtitle instead of changing the style to italics.

True, since you replace text.

Did you try to save it as SRT and load it again? This will force the loader to understand that there's a style change.

  1. Trying to use the new line character "\n" in the New value field. (This one looks like a bug to me)

By default Java doesn't consider multi-line entries, I think this is the reason for this.

nachocho commented 4 months ago

Thanks for the reply @teras. My responses inline...

True, since you replace text.

Did you try to save it as SRT and load it again? This will force the loader to understand that there's a style change.

Just tried it and it works, after reloading the file the italic tags are recognized as you suspected.

  1. Trying to use the new line character "\n" in the New value field. (This one looks like a bug to me)

By default Java doesn't consider multi-line entries, I think this is the reason for this.

Still sounds like a bug, because the regular expressions I have tried for replacing text in Jubler work on the whole subtitle entry (all of its lines of text) unless I use within the regular expression the "(?m)" directive to apply the regex per line of text.

I still don't get why the backslash is being stripped from the replace so instead of having "\n" in the replaced text I get only "n" without the backslash. And as I mentioned before, I did test this on https://regex101.com/ using Java as the programming language and it gives me the expected result there. So Java does consider multi-line entries unless you explicitly use "(?m)" at the start of the regular expression string.