protocool / AckMate

TextMate plugin (Cocoa) shell for running 'ack'
MIT License
723 stars 34 forks source link

Malformed UTF-8 error? #13

Closed pbhogan closed 13 years ago

pbhogan commented 14 years ago

Whenever I search for anything I get a few dozen of these after my correct search results. Even an empty search result will have them:

utf8 "\xE2" does not map to Unicode at /Users/pbhogan/Library/Application Support/TextMate/PlugIns/AckMate.tmplugin/Contents/Resources/ackmate_ack at line 1560, <$fh> line 1. Malformed UTF-8 character (unexpected non-continuation byte 0xe3, immediately after start byte 0xe2 in pattern match (m//) at /Users/pbhogan/Library/Application Support/TextMate/PlugIns/AckMate.tmplugin/Contents/Resources/ackmate_ack at line 1568.

protocool commented 14 years ago

Hey,

version 1.1.2 of AckMate expects all of your files to be utf8 encoded - it looks like you've got files that have a different encoding.

Your best bet is to install version 1.1.1 for the time being.

pbhogan commented 14 years ago

Thanks... do you happen to know of a way to find those files? My files should be in UTF-8 so I probably have a couple of rogue files somewhere.

jacquescrocker commented 14 years ago

Would be really nice if Ackmate printed out the file name in the error message of the file it choked on so we could convert to UTF-8. Thanks!

http://cl.ly/95c25cb102f302b72597

jjb commented 14 years ago

Same problem here. i tried running find . -exec iconv -t UTF-8 {} \; on my file tree, and no files were changed. I'm not sure if that's comprehensive.

maxim commented 13 years ago

+1, would be awesome to have this fixed.

ryross commented 13 years ago

Take the character that doesn't map ( \xE2 ) and run a regular expression search in TextMate to find the file that it's in. Reopen the file in UTF8, fix that character (or characters) and the save. If you replace all your bad characters AckMate will function correctly

mwillerich commented 13 years ago

+1 for showing which file contains the Malformed UTF-8

davidbgk commented 13 years ago

+1 I'd love to see that issue fixed too.

mttkay commented 13 years ago

Why does a search plugin make assumptions about file encodings? Isn't that a little intrusive? I want to keep files in whatever encoding I desire and still be able to search them.

dvdplm commented 13 years ago

+1 for a fix for this (and I must say I agree with kaeppler: what was the reason behind this change?)

protocool commented 13 years ago

It was added because I received numerous complaints that AckMate could not handle utf8 accented characters in search terms. Even without accented characters in the search term itself, any accented characters in the results tended to cause problems too.

If you don't care about accented characters then use version 1.1.1 like I said in my first response.

If you need support for accented characters, then use version 1.1.2 and accept the limitation that your files must be utf8 for AckMate to function properly.

It would be really nice to have something that did both but given that it works flawlessly for my own needs and the tradeoff between 1.1.1 and 1.1.2 is acceptable, I have no motivation to expend any effort on it.

jjb commented 13 years ago

Thanks for the response protocool, and thanks for AckMate!

Do you, or anyone else on this thread, know of a good command to search a file tree for the problem files?

Or maybe AckMate / AckMate's ack can be modified to inform the user of what the problem file is?

dvdplm commented 13 years ago

@protocol: I totally see your point and using 1.1.1 is totally fine

What weirds me out is that I'm pretty sure everyone on my team have their editors set to write utf-8 files and all my attempts to find the non-utf8 file(s) have failed. It would be amazing if AckMate could help by providing the filename that breaks it. The issue most probably arises somewhere in the interface code between Cocoa and perl, as the standalone version of ack used from the CLI never fails.

Finally, I'm really grateful for AckMate and use it daily, all day. Big thanks.

jjb commented 13 years ago

Yeah you should really sell it :-) I'd pay $10+ for it easily. peepopen is $12: http://peepcode.com/products/peepopen

golgote commented 13 years ago

It's a bug. All my files are utf-8 but I still get the malformed utf-8 errors. Switching back to Textmate default search in project, maybe slower but more stable.

jjb commented 13 years ago

Hmm… somewhere, I thought it was in this thread but I guess it was another… there was a recommendation for where to put a print statement in AckMate's ack, to show the offending file… anyone know where that is?

Well I guess it would be around line 1568 or 1560.

jjb commented 13 years ago

ah, here it is: https://github.com/protocool/AckMate/issues/#issue/12/comment/303961

rvega commented 13 years ago

+1 Please fix :)

jseibert commented 13 years ago

Hitting this constantly, and it slows the search results window to a crawl. Please fix

protocool commented 13 years ago

As I've said - if this is an issue, install version 1.1.1 - you can get it here: https://github.com/protocool/AckMate/downloads

If people are encountering the problem and using version 1.1.1 is unacceptable then please do step up to the plate and submit a fix.

jjb commented 13 years ago

protocool -- I think a good solution would be to report the offending file, and then suggest to the user they add a particular line to their .ackrc. I tried hacking around with this to catch the exception but my Perl Fu isn't good enough… If you can give me or others some guidance on this maybe we can put something together… we all love AckMate!

danielvlopes commented 13 years ago

@jjb I fixed the issue with this advice: https://github.com/protocool/AckMate/issues/#issue/12/comment/303961

The hack displays the file with problem. Thanks for the project.

pbhogan commented 13 years ago

I'm closing this since there seems to be no other way to unsubscribe from this issue.

jjb commented 13 years ago

Alright folks, I've put together a solution for reporting offending files. Tell me what you think: https://github.com/protocool/AckMate/wiki/Unicode-UTF-8-error-message

humbroll commented 12 years ago

fix plz. should be reopen.:)

airways commented 12 years ago

For the record, 1.1.1 also had this bug. I just installed it and get the same mess of errors.

airways commented 12 years ago

If anyone wants a quick fix for this issue, open up the reported ackmate_ack script and change the line that the error occurs on (2613 for me) to this:

use Encode;
$buffer =  encode("UTF-8", $buffer);
return $buffer =~ /$regex/m;

So far this has no negative effects on my search results, and prevents several hundred of these error messages that make scrolling the window impossible (it grinds TextMate to a halt on my quad core i7 with SSD - something very messed up in that list box's rendering if you ask me).

humbroll commented 12 years ago

@airways Great patch. thanks!

dvdplm commented 12 years ago

@airways looking good so far here too! Thanks!

standuprey commented 12 years ago

this fixed it for me