openminted / omtd-share-schema

OMTD-SHARE schema (versioned)
0 stars 0 forks source link

What format is DKPRO_TOKENIZED? #14

Open reckart opened 6 years ago

reckart commented 6 years ago

I found a format called "DKPRO_TOKENIZED" in the OMTD-SHARE annotation type enum... but I have no idea what this format actually is?

... and being a maintainer of DKPro (Core) I should probably know if that existed...

reckart commented 6 years ago

@pennyl67 any idea?

pennyl67 commented 6 years ago

"DkPro format for tokenized files containing one sentence per line and tokens split by whitespaces." found at https://dkpro.github.io/dkpro-core/releases/1.8.0/docs/format-reference.html#format-TokenizedText You're right that it's not the best label - but I wanted to further specified label than a simple "tokenized".

reckart commented 6 years ago

Aaaah! Maybe even that component should be renamed to like SentencePerLineTextWriter or something like that.