joddie / pcre2el

convert between PCRE, Emacs and rx regexp syntax
GNU General Public License v3.0
242 stars 25 forks source link

Case insensitive modifier? #17

Closed ghost closed 9 years ago

ghost commented 9 years ago

That would be convenient to have one—placing i at the end when you’re not sure about the case of a word instead of placing [aA] combinations for every letter. I didn’t quite understand, are modifiers not supported at all?

joddie commented 9 years ago

Sorry for the delayed response. Supporting /i would be tricky to do right because Emacs controls case-folding using a dynamically-bound variable, case-fold-search: as far as I know there's no way to embed case-insensitive behavior in the string or regexp object itself, as there is in Perl, JS, etc.

I'm not an expert, but I'm led to think that case-folding searching is rather tricky to do well for all the different alphabets in Unicode. I assume that Emacs probably does a good job with case-fold-search set to t, but I can't think of a good way to embed that flag in a translated regexp (short of advising all the regexp primitives in Elisp, which is not appealing).

For basic purposes, I guess it might be good enough to replace every literal character c with [cC] in the translated regexp: at least it might be good enough to be useful even if not fully correct. I will try to work on adding this, although I have limited time for the next two months.

joddie commented 9 years ago

After a bit of experimentation, this is not as bad to implement as I had supposed. There are some cases which will never work right (notably backreferences) but I hope we can provide enough to be useful for simple cases. I will push a cleaned up next branch containing this and several other cleanups/enhancements in the next few days.

This will also relate to issue #13 since all three modifier flags (x, s, i) need to allow toggling on and off when reading from the minibuffer.

joddie commented 9 years ago

This is now merged into master and should be available in the minibuffer using the C-c i keybinding. Let me know if it works!

ghost commented 9 years ago

Er… I’m not sure if I’m doing it right, but r[o]s now matches ros, rOs and rOS. The case-independent search is default now? A-and C-c i is undefined when I’m in the minibuffer.

joddie commented 9 years ago

What command / key sequence are you using? Also, what's the value of case-fold-search (which enables Emacs' builtin case-folding?) I think it defaults to t, so many searches are case-insensitive by default...

ghost commented 9 years ago

I use isearch-forward. case-fold-search is set to t.

joddie commented 9 years ago

I think what you are seeing is Emacs's default out-of-the box behavior: all searches are case-insensitive by default, unless you customize case-fold-search to nil. Normally, isearch will also go into case-sensitive mode if you type any uppercase characters, or you can enable it explicitly using isearch-toggle-case-fold (M-s c).

So it may be that the addition of an emulated /i flag isn't as useful as anticipated ;-) However, if you want to use it, you can customize case-fold-search to nil and then use the C-c i binding to toggle it. This binding wasn't enabled in isearch-mode before, but it should be now.

Note that the fake case-folding behavior enabled by the /i flag does not work with backreferences ((foo)\1 matches foofoo and FOOFOO, but not fooFOO), where Emacs's case-fold-search setting does the right thing.

ghost commented 9 years ago

or you can enable it explicitly using isearch-toggle-case-fold (M-s c).

Oh, I’ve been looking for that option for so long.

So it may be that the addition of an emulated /i flag isn't as useful as anticipated ;-)

Er… I don’t remember what exactly problems I had with case sensitiveness, but they definitely were there :-)

However, if you want to use it, you can customize case-fold-search to nil and then use the C-c i binding to toggle it.

Yay, it works, thank you!