hltfbk / Excitement-Open-Platform

Excitement Open Platform for Recognizing Textual Entailments
http://hltfbk.github.io/Excitement-Open-Platform/
86 stars 74 forks source link

About running the LAP #151

Closed mars198356 closed 11 years ago

mars198356 commented 11 years ago

I notice that for the moment in the Demo.java (which I assume as the current main entry to the EOP), the LAP is called before the EDA every time. There is getLAPClass() to initialize the LAP needed. However, it's already a bit out-dated, since that was for the demo system, i.e., only lexical processors were called. My questions are:

1) Do we need a separate call running the LAP?

2) Where should we specify the dependencies between LAP and EOP? In configuration files?

gilnoh commented 11 years ago

I think the following should be done.

Anyhow, I think this should be done by Demo code owner. I think FBK wrote and updated the Demo part. Maybe we can assign the issue to Roberto? (Roberto? Your opinion? --- and rzanoli is your user account, correct? )

vnastase commented 11 years ago

Hi Rui! This is Vivi :) Regarding the question about the LAP. I will give a couple of details about the implementation, and maybe after that you can explain your question, I didn't quite understand it. In the current version of the demo, the LAP can be specified from the command line (opennlp/treetagger/textpro), and the corresponding class is generated based on this command line input and the language read from the configuration file. If no such argument is given from the command line, the system initializes the defaults (TextPro for Italian, OpenNLP for English and German). What I didn't include in the LAP initialization step in the demo is a check whether the combination of specified LAP and language is valid, so to speak. I can include that. If you think there are dependencies between the LAP used and the chosen EDA, we can add a "compatibility" check as well. I think we have assumed that the LAP and the EDA are independent (bar the language, which we already account for) so no such dependencies (should) exist, because all LAPs produce the same annotations (for now). OK, so that's that from me. Your turn.

Vivi

rzanoli commented 11 years ago

The issue has been closed due to a mistake

mars198356 commented 11 years ago

Dear Vivi,

Thanks for the explanation. The situation is actually that OpenNLP/TreeTagger is only for tokenization and POS tagging. Some components like BagOfDepsScoring, TreeSkeletonScoring require syntactic dependency parsing. We currently include MaltParserDE and MaltParserEN in the LAP and possibly replace them with MSTParser in the future (but not for the first release). The issue can be of course hot-fixed by hard-coding the dependencies in the Demo class, but I would suggest to go for a more systematic solution, which is to encode such information in the configuration files (as Gil pointed out as well).

Best,

Rui

vnastase commented 11 years ago

Hi Rui! Gil suggested we read the LAP from the configuration file. Which can easily be done, but I don't think this solves the problem. Describing in some way all allowed (or not allowed) combinations would make the configuration files messy. Maybe an easier way to address this problem is to check for these (code) dependencies within the classes where they are used. For example, BagOfDepsScoring must iterate over (syntactic) Dependency annotations in the CAS. If it finds no such annotations, it could exit with a message that says it requires such annotations, and they can be produced for example, by this or this parser. Then the user will know which parser he should specify in the configuration file, or on the command line, if he didn't know in the beginning.

Vivi

gilnoh commented 11 years ago

To Vivi. For the moment, don't worry about which combination is "valid" and which combination is "not valid". It is the "users", and also "configuration writers" responsibility.

Note that we don't give any "validness" check on configurations --- for example, each numerical and string values in the configuration are not checked. If they are not valid, the components and EDAs will simply raise exceptions. What we do provide is simply a documentation within configuration examples. "this should be XX", "this should be YY".

In the same way, our EDA configurations does the same thing, "this configuration needs LAP with parse tree", or "using this component requires annotation YY". "valid combination", is too much for us to do for now (without component metadata). Since it is not only affects EDAs, but also affects components.

So I would recommend that just go ahead and treat it as a configurable value. This is fair enough, IMO. For example, we don't even check "EDA" selection of global configuration variable --- if we give NotExistingEDA, it will simply raise exception, aint it? So I think this is fair for the first release demo (or access) code.

vnastase commented 11 years ago

All righty then, Gil. So then I should just change the demo to read the LAP from the configuration file, and remove it from the command line? If yes, what would be the tag name (activatedLAP?) and should I keep a default configuration, or rely exclusively on the config file? And last question: will the LAP be provided with the full class name as the activatedEDA is now?

Vivi

gilnoh commented 11 years ago

I would recommend that doing the same way as we do for actiavedEDA: and all your three outlines sound good. (relying exclusively on config file, full class name, and tag name). -- Gil

vnastase commented 11 years ago

OK Gil. I'll adjust the demo.

Vivi

mars198356 commented 11 years ago

Thanks, Vivi!

Just a note: for MaxEntClassificationEDA, basically three LAPs are for the choice. OpenNLP and TreeTagger are already there; MaltParser is the third option, either MaltParserDE for German or MaltParserEN for English. Let me know if you have any questions concerning these modules.

Best,

Rui

vnastase commented 11 years ago

Rui, I actually think I don't need to know about the parsers :) I will just take whatever is in the configuration file with the tag activatedLAP. Which is nice, because we can add whatever parsers we want, and only the config file needs to change.

Vivi

mars198356 commented 11 years ago

Dear Vivi,

Thanks for the update!

I tested the demo configurations as well as some other ones. "activatedLAP" works fine. I will close this issue.

Best,

Rui