japerk / nltk-trainer

Train NLTK objects with zero code
http://nltk-trainer.readthedocs.org/en/latest/
Apache License 2.0
747 stars 225 forks source link

Can't load analyze_tagger_coverage with a ConllChunkCorpusReader #2

Closed ptnplanet closed 13 years ago

ptnplanet commented 13 years ago

The ConllChunkCorpusReader needs an extra argument, a list of nodetags.

File "analyze_tagger_coverage.py", line 47, in corpus = reader_cls(args.corpus, '.+') TypeError: init() takes at least 4 arguments (3 given)

japerk commented 13 years ago

analyze_tagger_coverage does not support custom corpus reader arguments yet, though I do plan to add support for it in the future. Until then, you must either specify a reader class that does not need custom arguments, or you can specify a known corpus, like conll2000. I'll leave this issue open until custom argument support is added.

japerk commented 13 years ago

Custom corpus readers are now supported, though I'm sure you've found a way around this by now.

sbrugman commented 6 years ago

Using the command python train_tagger.py conlltest --fileids ned.train --reader nltk.corpus.reader.conll.ConllChunkCorpusReader results in a similar error. The directory conlltest is a direct copy of conll2002 (for testing purposes), which is working correctly. Is there an argument missing?

loading conlltest
Traceback (most recent call last):
  File "train_tagger.py", line 119, in <module>
    tagged_corpus = load_corpus_reader(args.corpus, reader=args.reader, fileids=args.fileids)
  File "/path/to/nltk-trainer/nltk_trainer/__init__.py", line 89, in load_corpus_reader
    real_corpus = reader_cls(root, fileids, **kwargs)
TypeError: __init__() takes at least 4 arguments (3 given)
japerk commented 6 years ago

@sbrugman Yes, it looks like the ConllChunkCorpusReader requires a chunk_types argument. The simplest fix would be to create a wrapper class you can use as the reader class.