knu / ruby-unf

A wrapper library to bring Unicode Normalization Form support to Ruby/JRuby
BSD 2-Clause "Simplified" License
75 stars 11 forks source link

Make String monkey-patching optional #4

Open nirvdrum opened 10 years ago

nirvdrum commented 10 years ago

I'm looking to use unf as a replacement for the "unicode" gem in fog, but the String monkey-patching is a non-starter. As a transitive dependency, this isn't something we could push on all our users. If you're not opposed, I'd like to see the String monkey-patching pulled out of the default require path.

knu commented 10 years ago

Could you elaborate on how it can be a problem to have String#to_nfc, etc.?

Unf is used by popular gems like mechanzie (albeit indirectly) and twitter-text, and there are already a bunch of existing code that depend on the String methods, so I need to make sure it is worth making it optional.

nirvdrum commented 10 years ago

It looks like requiring "unf/normalizer" directly will circumvent the problem. I just get worried about bundler auto-requiring stuff.

nirvdrum commented 10 years ago

Monkey-patching core classes can be unexpected and can impact performance. It also can increase memory usage. Given the large number of Strings in a typical app and hithereto, no fog users have called the to_nfc method, it suggests a heavier cost for something not frequently used.

knu commented 10 years ago

Do you have numbers to show that?

If adding a few instance methods to String really impacted performance, then that would be an interpreter problem.

nirvdrum commented 10 years ago

I can send you heap dumps from what ActiveSupport does. It's a pretty drastic hit in aggregate. It's a death by thousand cuts thing. Whether that's an interpreter problem is debatable, given that you're making things bigger, but all ruby interpreters I'm aware of are susceptible to it. But if you want to table that argument, I think it's still a principle of least surprises situation.

Outside of ActiveSupport, I don't expect any gem to start monkey-patching core classes. And requiring the gem could very well override methods the user added himself, leading to additional surprise shrug

nirvdrum commented 10 years ago

Also, I'd take a look at this pretty comprehensive article on method cache invalidation:

https://charlie.bz/blog/things-that-clear-rubys-method-cache

knu commented 10 years ago

That's a nice article. As explained, the methods added in the String class will be properly cached as other methods and do no harm.

nirvdrum commented 10 years ago

Unless I'm misunderstanding something, as soon as you re-open String, you'll invalidate the method cache. Since requiring "unf" can happen any time at runtime and after a bunch of Strings have almost certainly been created, that would adversely impact performance.

knu commented 10 years ago

Yes, that happens just once.

nirvdrum commented 10 years ago

But it happens at runtime . . . Requiring a gem shouldn't cause massive method cache invalidation. It's not as if String is a lightly used class. And there's no guarantee that the gem will be required at app boot (where it still would have a performance impact, since many Strings will have been created already).

Moreover, that penalty shouldn't have to be paid for a method that may very well never be called. If all I want to use is the class methods, requiring "unf" shouldn't be monkey-patching String. For now, requiring "unf/normalizer" instead works around the problem, but there's no guarantees in place that that will always be the case, which causes quite a bit of concern.

nirvdrum commented 10 years ago

FYI, as a sanity check I asked a couple of the JRuby implementors and it appears I've been overstating the performance impact. I'm still not wild about patching a core class, but I wanted to update and frame the issue correctly.

jrochkind commented 10 years ago

I don't believe there are any realistic performance issues, I think you're right about that.

But I'd still prefer it did not monkey patch String. It's just dangerous, if you have some other library that also decides to monkey-patch String in the same way (say ActiveSupport -- or some other gem you have loaded -- were to hypothetically decide to add String#to_nfc too) -- then it can be unpredictable which gem's String#to_nfc you get. Which can matter if one of them has bugs and the other doesn't, or one of them has thread-safety issues and the other doens't, or one of them is much faster performance than the other. It's generally just unpleasant to have which implementation you get depend on gem load order.

Incidentally, i just compared and benchmarked a number of ruby alternatives for unicode normalization, and the unf gem came out far far ahead in performance, and is generally, I think, the one to use -- I would go so far as to say that any other gem that currently has it's own unicode normalization implementation should just use unf as a dependency instead. But the automatic monkey-patching is definitely going to keep some people from wanting to do that. (Here's my write-up: http://bibwild.wordpress.com/2013/11/19/benchmarking-ruby-unicode-normalization-alternatives/)

I would recommend you switch things so require 'unf' does not monkey-patch, and you need to explicitly require 'unf/string' to get the monkey-patches. Yes, it will not be backwards compatible -- but will only take a one line change to existing software that wants to keep the monkey-patching. If you were already at a 1.x release I'd say just release as 2.x advertising the backwards incompat, but you're still releasing as 0.x anyway, so, I dunno. Maybe release a 1.0.0 without automatic monkey patching!

And nice work, thanks for providing this gem!

jrochkind commented 10 years ago

in fact, the more I think about this, the less happy I become.

I would like to release a gem with 'unf' as a dependency, because my gem needs to do normalization. However, this all happens 'under the hood' of my gem -- I don't want someone using my gem to have to get the monkey patches whether they like it or not, they shouldn't even have to know I'm using 'unf' if they don't want, my gem needs to do some under-the-hood normalization, and 'unf' is the best way to do it -- but I really don't want to force the monkey-patching on my gem's users -- and then have to document that for them, let them now my gem is going to be monkey patching String (via unf).

So I'm not sure what to do now.

Make some sense?