Closed GoogleCodeExporter closed 9 years ago
More tutorials are great!
Imho the "first steps" tutorial is already way too long. How about adding a
separate tutorial page for setting up a pipeline with parser and NER? That new
tutorial could already assume that people know how to set up a DKPro/Maven
project and could focus on the actual pipeline implementation.
Would it be reasonable to use the OpenNLP parser and NER instead of Stanford in
a push to limit tutorials mostly to ASL components?
Original comment by richard.eckart
on 8 Apr 2013 at 6:29
Original comment by richard.eckart
on 8 Apr 2013 at 7:10
I like the idea of adding a separate tutorial page for setting up a pipeline
with parser and NER and would actually prefer that over extending the existing
First Steps tutorial.
Regarding the parser and NER, I would actually prefer Stanford, because we use
it in practice - there might be some reasons why we typically use Stanford
components rather than OpenNLP components?
Original comment by eckle.kohler
on 8 Apr 2013 at 7:35
The Stanford components probably produce better output, but we don't have any
tests that do actually show that.
Original comment by richard.eckart
on 8 Apr 2013 at 7:40
The First-Steps tutorial uses runPipeline to do the analysis, which means that
a reader and a writer are required. In conversations with some users, I noticed
that there is a desire to use the analysis results directly in the code after
the pipeline has run, without first saving stuff to files and then reading it
again.
What is supposed to happen with the analysis results in the new tutorial?
Original comment by richard.eckart
on 9 Apr 2013 at 10:05
what about the ClearNLP dependency parser? do we have any experiences with that?
Original comment by eckle.kohler
on 9 Apr 2013 at 7:26
Chris had a student compare different dependency parsers. I believe the MATE
parser came out best with malt in the second place. I see no strong reasons
against using GPL components if it serves your purposes. Since the DKPro
components are easily exchangeable, an example for a GPL component should work
with very little modification for a non-GPL component of the same kind.
Original comment by richard.eckart
on 9 Apr 2013 at 7:44
Having considered the dicussion and input so far, I suggest to write a tutorial
that describes:
- processing a small paragraph with Stanford CoreNLP components:
StanfordSegmenter, StanfordNamedEntityRecognizer, StanfordParser
- and writing out the noun phrases (NP) and Named Entities (NE) ocurring in the
NPs into a file (CSV format), such as e.g.
NP the new product of UKP Lab NE UKP Lab
NP the latest announcements in the news NE -
This implies describing how to handle the phrase structure parse tree, rather
than the dependencies representation.
If nobody objects until tomorrow 10 am, we will proceed this way.
Original comment by eckle.kohler
on 10 Apr 2013 at 5:58
Ok, sounds like some "writer" component will be written for writing and the
processing happens, as usual, with SimplePipeline. I'll try, at some point, to
write some documentation on different modes of running components (e.g. with
JCasIterable) in the uimaFIT docs.
Original comment by richard.eckart
on 10 Apr 2013 at 6:34
Examples need to be upgraded to DKPro Core 1.5.0/uimaFIT 2.0.0.
Original comment by richard.eckart
on 14 Aug 2013 at 9:20
@Nico: anything left to do here?
Original comment by richard.eckart
on 29 Sep 2013 at 2:58
@Richard: I don't think that it is upgraded to DKPro Core 1.5.0 and uimaFIT
2.0.0. Besides that there is nothing left to do any more.
Original comment by nico.erbs@gmail.com
on 29 Sep 2013 at 7:03
Updated to DKPro Core 1.5.0.
Original comment by richard.eckart
on 1 Oct 2013 at 3:39
Original issue reported on code.google.com by
richard....@googlemail.com
on 8 Apr 2013 at 3:04