opto / Expression-Search-NG

update of Thunderbird addon ExpressionSearch/Gmail UI for TB 78 and later
61 stars 8 forks source link

Need a debugger/builder tool like regexr.com but using the same engine the addon uses #102

Open TiagoTiago opened 9 months ago

TiagoTiago commented 9 months ago

I'm struggling to figure out why some messages are getting caught despite the regex expression I'm writing supposedly not matching, and showing as not matching when I test on a few online regex testers. I need a way to break down how the add-on is interpreting the expression and of how a given message is being interpreted as matching (which parts of the rule are relevant for which parts of the relevant fields etc).

opto commented 9 months ago

which search field do you use the regex for?

TiagoTiago commented 9 months ago

So far having trouble with both from and subject. Trying to filter to just everything with non English alphabet, but some stuff always gets thru. I even tried copy-pasting some of the stuff on sites to run regex on.

Wait, ah, I think I've got it now, or at least some of it. Tried a simpler pattern, not bothering with specifying Unicode ranges and stuff, and ensuring I add the ^ and $ markers around, so far looks I'm not getting anything thru that didn't meant to.

A debugger feature would still be helpful narrow down more complicated cases though.

Am I noticing the opposite issue now though; I'm seeing some messages where the from definitely has no English characters, but that filter hides them. Do I need to somehow add some additional regex wizardry to ignore email addresses while filtering in the from field?

TiagoTiago commented 9 months ago

Not sure if it's a clue, but a couple of the messages that are getting matched despite not being supposed to, on the GUI it shows the from name starting with a U+FFFD character ( REPLACEMENT CHARACTER from the Specials Unicode block ), the question mark inside a black polygon (which one depends on the font I guess, on my computer looks like a horizontally squashed hexagon, on Wikipedia it's a 45 degrees rotated square) , which I'm not gonna paste here just in case there's some bug on Github that could get triggered by it); but when looking at the source code of the message, it just looks like "©" (U+00A9).

That's not always the case though. Many messages that are slipping thru the filter seem to have the name displayed with just plain ASCII characters on the GUI.

edit: Oh, forgot this wasn't necessarily about a bug in the regex engine, but a request for better feedback on the inner workings and why it does what it does; sorry this comment is not all that on topic.