jpbro / VbPcre2

PCRE2 Wrapper for VB6
Other
14 stars 16 forks source link

Different Results vs. VBScript? #19

Open jpbro opened 7 years ago

jpbro commented 7 years ago

When Multiline = TRUE and Global = True (for VbScript/NA for my wrapper) consider the following subject:

"File1.zip.exe" & vbCrLf & "File2.com" & vbCrLf & "File 3"

And the following regex:

.*$

VBScript returns 6 matches, but my wrapper returns only 2. Who is right?

jpbro commented 7 years ago

See changes to modTests.TestRegex2 method for a demonstration as per commit https://github.com/jpbro/VbPcre2/commit/66c88a1db0ebeda970392a9ead25724557499481

dragokas commented 7 years ago

I see. 1

Strange, that PCRE2 produces in fact only one significant result: File1.zip.exe instead of three.

However, I think such .*$ regexp is incorrect at all. It is the same like (about)? regexp. Mean: you are trying to find empty string (as one of true results). If you enter such regexp e.g. on some online java regexp tester it will produce error, mean that regexp should not allow an empty strings as one of the true results of execution.

From this point of view, difference in results between VBS/PCRE2 is only a matter of its internal error handler mechanism which has different realization.

So, in real .+$ shoud be used instead of .*$.

As a conclusion, personally I believe that it is not necessary to touch such behavior. Anyway, if I would change something, I would detect regexp string that allow empty result and replace result string with raising error.

dragokas commented 7 years ago

Although, if VBS already produces the most complete result, I still would not have refused if PCRE2 would produces the same result to support strategy of PCRE2 as analogue of VBS.Regexp to show at least these 3 lines for .*$

But I don't khow, how you can track such cases and not break anything else.

jpbro commented 7 years ago

Yeah it's a bit of a weird one - interesting that some online regexp sites produce an error, but PCRE2 and VBScript produce results (albeit different). Makes it hard to know what the best approach is.

It might be that there is a PCRE2 option flag to handle this situation, I'll ahve to look at them all more closely (or maybe it's just up to my Global matching loop to work a bit differently to produce the same results as VBScript).

I don't have time to look closer right now, but I will try soon.

dragokas commented 7 years ago

According to my tests, no option pre-defined in your class allow to change behavior, except:

skacurt commented 5 years ago

Who is right?

Both results are correct. The wrong here is your expectation.

Multiline = True in VBScript's RegExp simply means ^$ match at line breaks which is an option that must explicitly set (as you did in VBScript) for PCRE, namely PCRE2_MULTILINE.

So it seems OK, you just changed the default behavior for VBScript but not for PCRE in your test.

skacurt commented 5 years ago

Oh, I forgot to mention. I've never used your wrapper. If you're sure that the PCRE2_MULTILINE flag is set in your wrapper that means a problem of your wrapper or PCRE. VBScript's RegExp works as it should in this case.