sanyaade-speechtools / delphi-museum-project

Automatically exported from code.google.com/p/delphi-museum-project
0 stars 0 forks source link

Need to work around MySQL stop-word list for keyword search and for checkHook tool. #188

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
See these notes from an email thread (read from the bottom):

The problem is specific to this word, and a list of another few hundred or
so that are considered "stop" words in the MySQL engine. I fixed it as well
as I could with a moderate amount of effort. To do more would require much
more effort than I would like to expend on this, at this time. 

Here's the list, FYI:
http://dev.mysql.com/doc/refman/5.1/en/fulltext-stopwords.html

As you can see, most are adverbs and modals that do not matter. Some have
multiple meanings that cause issues, including (from a quick scan):

can
contain(s)
course
like
little
near
new
not
novel
several
taken
use/used/uses
whole

At some point, perhaps I can explore customizing our MySQL installation (to
use a much reduced stopword list), but I am not sure what I can and cannot
do with the service we pay for.

Patrick

--------------------------------------------------------------------------------
From: Michael T. Black [mailto:mtblack@berkeley.edu] 
Sent: Tuesday, July 14, 2009 11:33 AM
To: Patrick Schmitz
Cc: PAHMA-Delphi Developers List
Subject: Re: Feedback on Object: 5-16393

Hi Patrick, 

Good news and bad....  

The good: Check Hook appears to be fixed -- I found a bunch of things when
searching for "Still".  

The bad (and this is something you can leave as-is for weeks/months, as
there's a workaround): 

When I search for "Still" I find lots of occurences of the string "still":
http://pahma-dev.berkeley.edu/delphi/api/checkHook.php?term=still&limithooks=0&t
ermlimit=1000

When I check the "allow partial matches" option and search for "Still", I
also find lots of occurences of strings containing "still", but no exact
matches of the string "still":
http://pahma-dev.berkeley.edu/delphi/api/checkHook.php?term=still&limithooks=0&u
separtial=1&termlimit=1000

Thanks!

Michael

On Jul 14, 2009, at 11:22 AM, Patrick Schmitz wrote:

Can you verify that this is fixed?

Thanks - Patrick

--------------------------------------------------------------------------------
From: delphi_feedback-bounces@lists.berkeley.edu
[mailto:delphi_feedback-bounces@lists.berkeley.edu] On Behalf Of Patrick
Schmitz
Sent: Tuesday, June 30, 2009 12:29 PM
To: 'Michael T. Black'
Cc: 'Delphi Feedback'
Subject: RE: Feedback on Object: 5-16393

I'll have to revisit the checkTerm code to figure out what's up. I know
that I have not updated the termStats table in a little while - may have to
do that more regularly. 

Patrick

--------------------------------------------------------------------------------
From: Michael T. Black [mailto:mtblack@berkeley.edu] 
Sent: Tuesday, June 30, 2009 12:00 PM
To: Patrick Schmitz
Cc: Delphi Feedback
Subject: Re: Feedback on Object: 5-16393

Addendum -- was going to solve problem (for now) by commenting out "Stills"
(since "Check Hook" seems to indicate we don't have any), but then I
checked Delphi and found that *53* things with images (200 total) are
latching "Still".  

So maybe there's a problem with the "Check Term" tool?  

Michael 
__________________
Dude, man, 

Whoa.  What's trippy, man, is that this is the latch for "Still" (the
alcohol-makin' kind):

<heading id="StillAlcEq" title="Still">
<synonym value="Stills"/>
</heading>

There are no implies statements pointing to this, so what you see is what
you get.

But then, checking what's latching, I find nada:

http://pahma-dev.berkeley.edu/delphi/api/checkHook.php?term=still&limithooks=0
http://pahma-dev.berkeley.edu/delphi/api/checkHook.php?term=stills&limithooks=0

So how is this object (or any object, for that matter) latching "Still"?

Dude, trippy.

Am leaving unresolved for now.

Michael

On Jun 29, 2009, at 9:47 PM, Patrick Schmitz wrote:

Trippy yes, but narcotic? See "still" as hook.

--------------------------------------------------------------------------------
Feedback on server: pahma-dev.berkeley.edu for Object: 693921

Original issue reported on code.google.com by LudicrousResearcher@gmail.com on 14 Jul 2009 at 7:22