Closed TomokiMatsuno closed 4 years ago
Thank you for reporting this suspicious behavior. @TomokiMatsuno I'd like to revise that in next major version after spaCy v2.3.1 released.
Sorry for late. I tested these input strings with GiNZA v4 and found fixed. Thank you again! @TomokiMatsuno
When performing sequence labeling on a document with multiple paragraphs, I've found inconsistency with regard to newline characters between
doc.text
ofen_core_web_sm (version: 2.2.5)
andja_ginza (version: 3.1.0)
.Like the examples below, 1+ newline characters in input string become 1 white space in doc.text in
ja_ginza
while they remain the same inen_core_web_sm
.This makes it difficult to label a document maintaining its paragraph structure.
Snippet for loading a model and parsing input text
Input string and doc.text
model: en_core_web_sm (version: 2.2.5)
model: ja_ginza (version: 3.1.0)