juditacs / semeval

MathLing Budapest Team's repo
MIT License
10 stars 9 forks source link

wordnet boost #12

Open recski opened 9 years ago

recski commented 9 years ago

2.2.1 of Han et al. 2013 is not implemented yet!

recski commented 9 years ago

@juditacs any progress on this?

juditacs commented 9 years ago

Ok, I implemented most of it, I'm testing it right now. However, it's clear that this is going to be very slow.

The paper is quite vague about some parts and I'm not sure I understood it completely, so we should hold a meeting on Friday to discuss this.

recski commented 9 years ago

OK, great, thanks! If you send a PR after testing, then hopefully I can have a brief look before Friday myself.

On Wed, Nov 12, 2014 at 6:35 PM, Judit Acs notifications@github.com wrote:

Ok, I implemented most of it, I'm testing it right now. However, it's clear that this is going to be very slow.

The paper is quite vague about some parts and I'm not sure I understood it completely, so we should hold a meeting on Friday to discuss this.

— Reply to this email directly or view it on GitHub https://github.com/juditacs/semeval/issues/12#issuecomment-62758282.

juditacs commented 9 years ago

OK, I got home and it was still running so I killed it. It processed about half of the input, so this is clearly unacceptable. Profiling suggests that most of the time is spent looking up stuff in wordnet. I will think about ways to reduce this by caching.

juditacs commented 9 years ago

Somehow LSA similarity is called multiple times for the same word pairs. I cached the similarities and halved the running time but I'm still not satisfied. A significant portion of time is spent by nltk binary lookup, so there is still room fog more caching. I'm working on it.

juditacs commented 9 years ago

Caching is done, all features are implemented except the one looking for a headword in a definition (explained in (Collins, 1999)) and comparing it with the other word.

juditacs commented 9 years ago

Could you please take a look at the Collins parser's output and let me know which one is the headword? (This is the output of an example included in the parser)

/home/judit/tools/COLLINS-PARSER/examples/sec23.model1
recski commented 9 years ago

The format won't tell you, you could use a set of simple rules, such as if a phrase is NP, then the head is the rightmost noun, etc., but I guess you'll need to do this for wordnet glosses, so we'd have to hope that they are always parsed as a single phrase. When I needed the same thing ("heads" of definitions) from Longman, I used the Stanford Dependency Parser, which marks exactly one word as "ROOT"; tThat worked nicely, but I'm not sure this whole thing is important enough for us right now.

On Mon, Nov 24, 2014 at 4:03 PM, Judit Acs notifications@github.com wrote:

Could you please take a look at the Collins parser's output and let me know which one is the headword? (This is the output of an example included in the parser)

/home/judit/tools/COLLINS-PARSER/examples/sec23.model1

— Reply to this email directly or view it on GitHub https://github.com/juditacs/semeval/issues/12#issuecomment-64205777.

juditacs commented 9 years ago

Thank you for the info.

Yes, I agree, it is absolutely low priority. I know we have doubts about our evaluation "method" but I haven't seen a single example where this would do any good.