Closed tuetschek closed 7 years ago
@ptakopysk and @martinpopel : could you please check this?
Great, thanks! I had this on my todo-list for a while.
Thx, LGTM.
However, you changed the semantics in case of Analysis::CS::M, which in fact was rather W+M (tokenizer & tagger), but now it is only M (no tokenizer, just tagger). I have just updated my code not to use these M and A scenarios at all, to be on the safe side, but if someone else uses Analysis::CS::M, this commit has broken his workflow (I don't think anyone uses it, but I don't know how to find out whether this is true).
I have several suggestions for a fix:
I would probably vote for 4., I have no personal attachment to these blocks, even though I had authored them ;-) However, if we decide to keep M as it is now, at least the Description section of its documentation should be updated accordingly.
Any solution 1-4 LGTM is OK for me, probably also slightly preferring 4 (Why have it only for CS and not for other languages? But introducing these subscenarios files for each language and keeping POD in sync is some extra work).
In future (if someone has time), we could implement tokenizer=auto, that is the tokenization block would exit if it detects the tokens (a-tree with a-nodes) are already there.
Oops, you're right. I only kept the Analysis::CS::{M,A,N,T}
blocks so that I wouldn't break your code, @ptakopysk . Looks like I managed to break it anyway... OK, I'll delete them shortly :-).
The structure now follows the English analysis (with parameters
tokenizer
,tagger
,ner
,parser
, andtecto
that may be set tonone
to disable the given part of the analysis).The original Analysis::CS::[AMNT] modules now use the joint module with parameters.