Support separately analyzed fields

o19s / skipchunk

Extracts a latent knowledge graph from text and index/query it in elasticsearch or solr

MIT License

19 stars 2 forks source link

Support separately analyzed fields #3

Closed binarymax closed 4 years ago

binarymax commented 4 years ago

Currently, all text fields for analysis are concatenated into a single field before continuing through the pipeline. Mechanisms need to be implemented to keep the text from these fields separate.

For example, if a title and body are specified, they should be processed separately for the following:

html stripping
sentencizing
enriching
payloading

binarymax commented 4 years ago

Implemented with separator boundaries as sentences for the nlp.pipe stage. Perhaps a bit hacky but it's faster than having to parse each field separately.