wocommunity / wonder

Project Wonder
http://wiki.wocommunity.org/
239 stars 166 forks source link

Why does ERXStringUtilities.distance() return a double? #312

Closed paulhoadley closed 11 years ago

paulhoadley commented 11 years ago

As far as I can see, this method is calculating the Levenshtein distance between its arguments, an algorithm which returns an integer. Eyeballing it, I assume the reason is for convenience of fuzzyMatch(), a method in which distance() is called six times, where it seems to be used among some other double values, though it's not even clear whether all of those should be or need to be double values.

If distance() is to be a serious method in its own right, and it is public, then it should return an integer. At the very least, fuzzyMatch() could then cast the result to a double, though I don't immediately see a good reason for even doing that.

Of course, there will be backwards compatibility implications, but I'm not sure that's a good enough reason to let this stand. The algorithm returns an integer, and so should this method.

darkv commented 11 years ago

Could be a change for Wonder v6 and one point of a migration guide, though I doubt that many are using this method anyway so most of them won't notice.

nullterminated commented 11 years ago

I would object to changing the return type on principal. One does not change return types of established public methods. One deprecates the old method, replaces it with a new method that has the desired behavior, and documents the change in the deprecation javadoc.

http://docs.oracle.com/javase/1.5.0/docs/guide/javadoc/deprecation/deprecation.html

paulhoadley commented 11 years ago

Doing this via deprecation sounds just fine—for one thing, the method should probably be called levenshteinDistance() anyway, so as to distinguish it from, say, hammingDistance(). In that case, we should still talk about a timeline for removal of distance(), though. As we're discussing elsewhere, the contract for backwards compatibility should be made explicit. I don't think we should mark it deprecated and then leave it indefinitely.

(Having said all of that, it's not like I have much invested in this. I've never used the method, I just noticed it last night. That it returns a double just stuck out like a sore thumb, and offended the mathematician in me.)

hprange commented 11 years ago

We could remove every deprecated method on major releases (based on the major.minor.bug_fix version schema).

darkv commented 11 years ago

Calling the new method levenshteinDistance() and deprecating the current one seems reasonably, where is the pull request? ;)

Removing every deprecated method on major releases is perhaps a little too much, but those we catch and think should go away...

paulhoadley commented 11 years ago

Working on it right now... :)

paulhoadley commented 11 years ago

No need to merge the pull request straight away. I am more than happy to leave it open for discussion for now. Ramsey, are you happy with this?

paulhoadley commented 11 years ago

Ha! Johann, you're too fast! Happy to roll it back if Ramsey or anyone else doesn't like it.

darkv commented 11 years ago

Ups sorry, I was too fast ;) But keeping the old method with the exact API preserves compatibility with old code.

paulhoadley commented 11 years ago

Yeah, I believe this makes absolutely no functional change.

ishimoto commented 11 years ago

Yes you are to fast. http://youtu.be/839DO8-0cY8

nullterminated commented 11 years ago

On Nov 7, 2012, at 2:40 PM, Paul Hoadley wrote:

Doing this via deprecation sounds just fine—for one thing, the method should probably be called levenshteinDistance() anyway, so as to distinguish it from, say, hammingDistance(). In that case, we should still talk about a timeline for removal of distance(), though. As we're discussing elsewhere, the contract for backwards compatibility should be made explicit. I don't think we should mark it deprecated and then leave it indefinitely.

(Having said all of that, it's not like I have much invested in this. I've never used the method, I just noticed it last night. That it returns a double just stuck out like a sore thumb, and offended the mathematician in me.)

Sounds good to me

darkv commented 11 years ago

So everyone agrees and as the patch is already committed I close this issue.