unDocUMeantIt / koRpus

An R Package for Text Analysis
GNU General Public License v3.0
45 stars 6 forks source link

last char truncated bug fix (issue #10) #11

Closed AdamSpannbauer closed 7 years ago

AdamSpannbauer commented 7 years ago

only one line of code changed in pull request to address #10 where last char is being truncated in koRpus::treetag()

doc <- "The quick brown fox jumped over the lazy dog"

# pre bug fix in R/treetag.R
koRpus::treetag(doc, treetagger = "manual", format = "obj", 
                encoding = "UTF-8", lang = "en", TT.tknz = FALSE, 
                TT.options = list(path = "/u/application/TreeTagger", preset = "en"))
#    token tag lemma lttr      wclass                                             desc stop stem
# 1    The  DT   the    3  determiner                                       Determiner   NA   NA
# 2  quick  JJ quick    5   adjective                                        Adjective   NA   NA
# 3  brown  JJ brown    5   adjective                                        Adjective   NA   NA
# 4    fox  NN   fox    3        noun                           Noun, singular or mass   NA   NA
# 5 jumped VBD  jump    6        verb                      Verb, past tense of "to be"   NA   NA
# 6   over  IN  over    4 preposition         Preposition or subordinating conjunction   NA   NA
# 7    the  DT   the    3  determiner                                       Determiner   NA   NA
# 8   lazy  JJ  lazy    4   adjective                                        Adjective   NA   NA
# 9     do VBP    do    2        verb Verb, non-3rd person singular present of "to be"   NA   NA

# post bug fix in R/treetag.R
koRpus::treetag(doc, treetagger = "manual", format = "obj", 
                encoding = "UTF-8", lang = "en", TT.tknz = FALSE, 
                TT.options = list(path = "/u/application/TreeTagger", preset = "en"))
#    token tag lemma lttr      wclass                                     desc stop stem
# 1    The  DT   the    3  determiner                               Determiner   NA   NA
# 2  quick  JJ quick    5   adjective                                Adjective   NA   NA
# 3  brown  JJ brown    5   adjective                                Adjective   NA   NA
# 4    fox  NN   fox    3        noun                   Noun, singular or mass   NA   NA
# 5 jumped VBD  jump    6        verb              Verb, past tense of "to be"   NA   NA
# 6   over  IN  over    4 preposition Preposition or subordinating conjunction   NA   NA
# 7    the  DT   the    3  determiner                               Determiner   NA   NA
# 8   lazy  JJ  lazy    4   adjective                                Adjective   NA   NA
# 9    dog  NN   dog    3        noun                   Noun, singular or mass   NA   NA
unDocUMeantIt commented 7 years ago

thanks for finding the issue and propsing a fix!

a pity that this just came in while i was doing the 0.11-1 release :-/ i'll gladly pull your patch if you re-submit it against the current develop branch (like requested in README.md).