Currently, all text fields for analysis are concatenated into a single field before continuing through the pipeline. Mechanisms need to be implemented to keep the text from these fields separate.
For example, if a title and body are specified, they should be processed separately for the following:
Implemented with separator boundaries as sentences for the nlp.pipe stage. Perhaps a bit hacky but it's faster than having to parse each field separately.
Currently, all text fields for analysis are concatenated into a single field before continuing through the pipeline. Mechanisms need to be implemented to keep the text from these fields separate.
For example, if a
title
andbody
are specified, they should be processed separately for the following: