microth / heideltime

Automatically exported from code.google.com/p/heideltime
4 stars 1 forks source link

improper handling of newline when reading files #20

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The main() in class HeidelTimeStnadalone reads input with this loop:

    while ((line = fileReader.readLine()) != null)
       sb.append(System.getProperty("line.separator")+line);
                        }
This has the effect of adding a newline at the beginning and leaving the last 
line unterminated.

This affects the tokenizer and POS tagger I am using, which gets an extra empty 
token at the beginning and causing a disalignement in tokens.

It should be changed to:

    while ((line = fileReader.readLine()) != null)
       sb.append(line + System.getProperty("line.separator"));

Original issue reported on code.google.com by atta...@gmail.com on 18 Oct 2014 at 8:20

GoogleCodeExporter commented 9 years ago
Hey and thanks for the report.

This is indeed some unfortunate code and I've gone ahead and fixed it to the 
extent where it reads the input text verbatim from file (without mangling line 
terminations).

It'll find its way into the soon to be released HeidelTime 1.8. If you want to 
see the changes before that, take a look at r59623843e127.

Original comment by z...@informatik.uni-heidelberg.de on 19 Oct 2014 at 4:56

GoogleCodeExporter commented 9 years ago

Original comment by z...@informatik.uni-heidelberg.de on 8 Dec 2014 at 2:21