Closed degyves closed 8 years ago
The problem with the Penn Treebank corpus is that it's a proprietary one, so, unfortunately, I can't include it with cl-nlp
- you need to get it on your own. Here's a link to the original release: https://catalog.ldc.upenn.edu/LDC99T42. This one is really expensive, however, recently there was an updated release that is much more affordable: https://catalog.ldc.upenn.edu/LDC2015T13 (I haven't looked at it yet, so I don't know if there are any changes to the format).
Now, Ontonotes (the source of onf-wsj
) doesn't provide data in the same tagged format as the Penn Treebank, so it doesn't make sense to use (map-corus :ptb-tagged ...)
with it. You can see and example of the tagged Penn Treebank representation here: https://github.com/vseloved/cl-nlp/blob/master/corpora/samples/WSJ_0001.POS
Finally, speaking about the undefined error function, the docs should be updated to reflect the chnages made during the recent refactoring: basically, the function text-tokens
is not called text-tokenized
, and the internal structure has changed to a list of lists of lists (paragraph-sentence-tokens 3-level structure).
Also, is the form unbalanced in the docs?
Yes, that is correct, thanks! Fixed.
On docs/user-guide/examples/eng-pos-tagger.md are given some instructions that fail:
The following code:
... apears to be two separate forms: the let and the reduce form.
The first error is that there is no file WSJ under corpora/ptb/TAGGED/POS/
But if we change it to an existing corpora under corpora/, as "onf-wsj":
Then CCL:UNDEFINED-FUNCTION-CALL is spawned. There is no such function.
Any clues?
I'm using Clozure Common Lisp 1.10. Under SBCL, it made a thread-error by just running the first let. Using Windows 8 64-bit.