protocool / AckMate

TextMate plugin (Cocoa) shell for running 'ack'
MIT License
723 stars 34 forks source link

Searching for words containing Å, Ä or Ö. #10

Closed c7 closed 14 years ago

c7 commented 14 years ago

It might be an issue with Ack directly, but it doesn’t seem to be possible to search for words containing local characters like Å, Ä and Ö. Also, the selection is wrong when you search for a partial word that contains those characters.

protocool commented 14 years ago

If I try from the command line I get the same issue so I'm assuming it's a problem with ack.

Although, to be fair - I just did a quick google for info on perl and utf-8 encoding ... it looks like a real minefield.

If you (or anyone) manage to get ack from the command-line working for those searches I will happily do the same thing in AckMate.

As for the selection problem - obviously I'm calculating the selection position wrong. I'll see what I can do to fix it.

c7 commented 14 years ago

I suspect the selection position to be off when having multibyte characters present. I will try to fix the problem in Ack and report back on my findings.

c7 commented 14 years ago

I downloaded the latest ack-standalone (version 1.92) and did a quick (successful) search in my features folder:

$ ack Sökning .
search_and_filter.feature
2:Egenskap: Sökning och filtrering
12:  Scenario: Sökning på

My Perl version is 5.10.0 and I’m running Snow Leopard.

protocool commented 14 years ago

Hey,

I just uploaded version 1.1.2 which includes some improvements to ack around accented characters.

It almost certainly solves the issue with the selection being wrong when accented characters are in the results.

It also probably solves the issue of not being able to search for accented characters although it hasn't been thoroughly tested and I'm only unicode-normalizing the search term.

Regards, Trev

c7 commented 14 years ago

Thanks for that, I’ll check it out after the holidays this weekend.

petdance commented 14 years ago

Are these changes that should get back into ack itself?

protocool commented 14 years ago

Andy: as of now, I'd say an emphatic NO :-P

I do think a variation of the change to Basic.pm in http://github.com/protocool/ack/commit/d373b40541757193e2b905b18f11806b84d3aa5c might eventually be a good idea but not as I've implemented it.

The point of that change to Basic.pm is: sysread reports characters read in the current encoding, which may be different from the number of bytes in the file. You'll know this better than me, but does the precise number of characters read by sysread really matter to needs_line_scan?

As for the rest of the changes to ack in that commit (unicode normalization): that's related to now NSTask mangles ARGV and not required by the upstream version of ack.

And as for my "everything will be utf8" assumption: it seemed like a reasonable one for TextMate.

Unfortunately, and with fewer than 500 downloads, I've already received a handful of complaints that this latest version is choking with encoding errors due to stray non-utf8 files.

Trev