unDocUMeantIt / koRpus

An R Package for Text Analysis
GNU General Public License v3.0
45 stars 6 forks source link

last character being truncated in koRpus::treetag #10

Closed AdamSpannbauer closed 7 years ago

AdamSpannbauer commented 7 years ago

The last character is being truncated from the input object of koRpus::treetag() when TT.tknz is FALSE. (see "dog" being truncated to "do" in the last row of the output table). This issue is not present when TT.tknz is set to TRUE.

example

doc <- "The quick brown fox jumped over the lazy dog"

# pre bug fix in R/treetag.R
koRpus::treetag(doc, treetagger = "manual", format = "obj", 
                encoding = "UTF-8", lang = "en", TT.tknz = FALSE, 
                TT.options = list(path = "/u/application/TreeTagger", preset = "en"))
#    token tag lemma lttr      wclass                                             desc stop stem
# 1    The  DT   the    3  determiner                                       Determiner   NA   NA
# 2  quick  JJ quick    5   adjective                                        Adjective   NA   NA
# 3  brown  JJ brown    5   adjective                                        Adjective   NA   NA
# 4    fox  NN   fox    3        noun                           Noun, singular or mass   NA   NA
# 5 jumped VBD  jump    6        verb                      Verb, past tense of "to be"   NA   NA
# 6   over  IN  over    4 preposition         Preposition or subordinating conjunction   NA   NA
# 7    the  DT   the    3  determiner                                       Determiner   NA   NA
# 8   lazy  JJ  lazy    4   adjective                                        Adjective   NA   NA
# 9     do VBP    do    2        verb Verb, non-3rd person singular present of "to be"   NA   NA

sessionInfo()
# R version 3.3.3 (2017-03-06)
# Platform: x86_64-redhat-linux-gnu (64-bit)
# Running under: Red Hat Enterprise Linux Server 7.3 (Maipo)

packageVersion("koRpus")
# [1] ‘0.10.2’

The issue seems to be stemming from this line in R/treetag.R because there is no return after the contents of the file cat out. Adjusting the function to cat a new line after the user's input object appears to fix the issue. I will submit a pull request with the fix I implemented for review.

AdamSpannbauer commented 7 years ago

sorry for submission to wrong place. just submitted a pull request with the fix to develop branch

unDocUMeantIt commented 7 years ago

thanks, fix will go in 0.11-2 :-)