xiaoyangren / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

Brown POS-Mapping incorrect for punctuation-class #431

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The tag set mapping 'en-brown-pos.map' maps the tag

PCT to UIMA class PUNC, it should be pct 

This error causes all punctuations to be mapped to the UIMA.Other tag.

Original issue reported on code.google.com by Tobias.H...@gmail.com on 29 Jul 2014 at 9:20

GoogleCodeExporter commented 9 years ago
mapping updated

Original comment by Tobias.H...@gmail.com on 29 Jul 2014 at 9:23

GoogleCodeExporter commented 9 years ago
Do you know a URL which provides a documentation of the Brown tagset? If 
possible, we try to keep a header like these in the mapping files:

# Penn Treebank Tagset
#
# Source (PTB 1): 
http://faculty.washington.edu/dillon/GramResources/penntable.html
# Source (PTB 2): http://www.clips.ua.ac.be/pages/mbsp-tags

---

# STTS Tag Table (1995/1999)
#
# Source: http://www.sfs.uni-tuebingen.de/resources/stts-1999.pdf
#         
http://www.ims.uni-stuttgart.de/forschung/ressourcen/lexika/TagSets/stts-table.h
tml

Original comment by richard.eckart on 29 Jul 2014 at 9:26

GoogleCodeExporter commented 9 years ago
No unfortunately not. There are some lists available that show an very complex 
tag set, but it is not used in the actual corpus. For instance: 
http://www.comp.leeds.ac.uk/ccalas/tagsets/brown.html

All those joint-tags with a '-' are  (at least in the TEI version I obtained 
from the Python NLTK) not used in the corpus.

Original comment by Tobias.H...@gmail.com on 29 Jul 2014 at 9:31

GoogleCodeExporter commented 9 years ago
with a '+'

Original comment by Tobias.H...@gmail.com on 29 Jul 2014 at 9:32