square / crossfilter

Fast n-dimensional filtering and grouping of records.
https://square.github.com/crossfilter/
Other
6.22k stars 1.31k forks source link

Project name "tesseract" is already used by a prominent OCR program #1

Closed zdw closed 12 years ago

zdw commented 12 years ago

See here: http://code.google.com/p/tesseract-ocr/ and also here: http://news.ycombinator.com/item?id=3724592

Kalyse commented 12 years ago

Just rename the project to tesseract.js.

Tesseract is a perfect name for Square.

stevegraham commented 12 years ago

+1 to renaming this project. Also merely appending js to the name suggests this is a js implementation of the Tesseract OCR engine.

mindcrime commented 12 years ago

+1 for renaming. Tesseract (OCR) is pretty well established with that name already.

robraux commented 12 years ago

Agreed. Interesting project, but definitely confused me immediately.

amoffat commented 12 years ago

-1, leave it. Tesseract is a fitting name. We don't name people completely unique names, or insist that a person with a name already taken change it, why do it for software? "But there's already an established person with that name!"

stevegraham commented 12 years ago

@amoffat stay tuned for this new js library i'm working on. I'm going to call it "Linux" :P

carbocation commented 12 years ago

This is a meaningful issue. I can already brew search tesseract or aptitude search tesseract to find the FOSS OCR software. It would be great for this project to have a different name, at least publicly.

amoffat commented 12 years ago

@stevegraham that's fine :) To nerd out for a minute, this is kind of how I visualize it:

how_fine = how_fitting * fitting_weight
if has_name_collision:
   how_fine -= audience_weight / collision_potential_audience_size
   how_fine -= category_weight / collision_category_distance

is_fine_name = how_fine > some_threshold

Tesseract has a high how_fitting, because the name is pretty logical choice for the project and company. collision_potential_audience_size is the ratio of this project's potential audience size to the existing audience size of the collision's project. I think this is pretty high too (the audience of OCR guys is a lot smaller than data-slicing guys). The collision_category_distance is pretty large too...OCR is in a pretty different space than data slicing.

Your hypothetical js library...small how_fitting, small collision_potential_audience_size, and probably large collision_category_distance. Probably doesn't meet the threshold, so "Linux" is a bad name :)

davidcelis commented 12 years ago

Naming software is too hard of a problem to worry about collision like this.

terretta commented 12 years ago

Consider another name for the concept, such as cubicprism, that also conveniently enough helps convey what the project helps do: look at an OLAP cube in a different light.

@amoffat — Wikipedia's http://en.wikipedia.org/wiki/Tesseract_(software) is not a disambiguation page. "Collision potential" = 100%. And I doubt OCR has fewer users than data cubes.

amoffat commented 12 years ago

@terretta I know it's 100%...but I don't see how that changes my argument. But I disagree on the audience size.

terretta commented 12 years ago

@amoffat Most anyone I know with an Android or iOS phone has a smattering of document related apps with OCR, such as Evernote or JotNot. Few have a data cube reporting visualizer, unless Google Analytics counts. Same goes for desktop. So I'm referencing the audience size of both devs and end users who may use tools or see project credits in their apps, and saying more use OCR.

In any case, with Google SERPs showing the number two link for the word Tesseract linking to the 25 year old project now funded by Google, seems like the discussion should already be over.

amoffat commented 12 years ago

@terretta You're muddling things. My point is that the potential audience size for data analysis software, which is huge, is larger than people who know about the Tesseract OCR package. People who use OCR in general do not count, because "Tesseract" is not a name collision to them. It's only a name collision to people who know Tesseract OCR. We can agree to disagree.

Arguing about this is stupid though. It's not up to us :) I just disagree with the idea that someone should change a fitting name because the name is already used for something. Maybe the math guys should demand that Tesseract (OCR) be renamed something that isn't already used as a geometric concept, because, shit, that's been used since 1888! :) Double-edged sword.

That said, it's not a big deal either way

terretta commented 12 years ago

It is a big deal. This kind of move by a well regarded player in the software community -- failure to self regulate a namespace collision with a highly regarded, well respected tool appearing under its own name in a decade of publications -- is what drives the continuing perception of a "need" for trademarks and patents. Your argument that Square should be able to override the name regardless of history simply because this new tool may be more popular, is particularly graceless.

Btw, it's not Tesseract OCR. It's just Tesseract. Check out Linux Journal, July 2007, for example:

Recently, I was looking again and found a project called Tesseract. Tesseract is the product of HP research efforts that occurred in the late 1980s and early 1990s. HP and UNLV placed it on SourceForge in 2005, and it is in the process of migrating to Google Code.

audionerd commented 12 years ago

Call it Tesseractor because it acts on tesseracts.

amoffat commented 12 years ago

It is a big deal. ...failure to self regulate a namespace collision ... is what drives the continuing perception of a "need" for trademarks and patents.

I agree, it is clear the creator should commit seppuku to restore his honor.

benatkin commented 12 years ago

Of all the suggested names, I prefer tesseract. :trollface:

ironclad-zz commented 12 years ago

+1 How about changing to 8-cell, octachoron or tetracube?

jkuhnert commented 12 years ago

Have to agree with @benatkin, the project name should be changed to tesseract. Kind of has a nice ring to it.

mbostock commented 12 years ago

Renamed to Crossfilter, partly in homage to Chris Weaver's work on multidimensional visualization. It may not have the intrigue of "tesseract", but it does describe the library's function succinctly.

benatkin commented 12 years ago

:+1: