homermultitext / hmt-utils

Utility library for editorial work specific to the standards of the Homer Multitext project
0 stars 0 forks source link

Some elements should retain internal whitespace prior to full tokenization #46

Closed neelsmith closed 9 years ago

neelsmith commented 9 years ago

Maybe remove leading/trailing, but not internal to handle multi-word sequences?

neelsmith commented 9 years ago

This should apply more generally to container elements:

b/c we want to be able to tag phrases like "Pallas Athena".

Tokenizers will need to take account of this!

Changing title of issue to more general title

neelsmith commented 9 years ago

Pers name does this now.

neelsmith commented 9 years ago

done and tested for ethnics, personal and place named entities.