tarsqi / ttk

Tarsqi Toolkit
Apache License 2.0
25 stars 10 forks source link

Introduce views? #3

Open marcverhagen opened 8 years ago

marcverhagen commented 8 years ago

Is there a case for adding views to Tarsqi? A view would contain some set of tags and will be totally separate form other views. We could have a view for Evita events and one for events taken from another component.

Not sure if this is worth the trouble. An alternative is to enforce that each tag has a source attribute that stores what component created the tag.

marcverhagen commented 8 years ago

Views are nice, but using them inside of TTK is probably a case of over-engineering. Tarsqi creates documents according to a certain pipeline and that's it. There are no views needed for that. We may want to add several components that add EVENTS, like there are several components that add TLINKS, using a source attribute to keep track of what component added a tag would be enough.

Let's focus on flexibility in taking several kinds of input and adjust the pipeline accordingly. For example, taking YTEX output (in TTK format) could be useful even if we just use the tokenization, tagging and lemmatization of YTEX. May want to spend some time on creating a ytex --source option which loads some tags into the tarsqi_tags.

marcverhagen commented 8 years ago

Here is a potential advantage of having views. Currently, you can run a pipeline with a preprocessor and save the results as a ttk file. You can then run a pipeline with Evita. But say you ran the second pipeline with the preprocessor as well. In that case, if you have views you would have Evita select one of the views and nothing bad happens, except that in the end you have two views with preprocessor data. But currently you get a document with duplicate sentences and chunks (somehow tokens do not get duplicated) and this results in weird TarsqiTree instances that break Evita.

reevesr commented 8 years ago

I can see the sense of having views,given this duplication problem. I guess the question is whether having views is the easiest way to solve that.

marcverhagen commented 8 years ago

Using views is definitely a more scalable solution. I will look a bit more into how much coding and added complexity it would actually take.