patrickfrey / strusAnalyzer

Library for document analysis (segmentation, tokenization, normalization, aggregation) with the goal to get a set of items that can be inserted into a strus storage. Also some functions for analysing tokens or phrases of the strus query are provided.
http://www.project-strus.net
Mozilla Public License 2.0
3 stars 0 forks source link

Segmenters should have options #27

Closed patrickfrey closed 8 years ago

patrickfrey commented 8 years ago

Some segmenters like CSV need options, because their behaviour is not fully standardised and the data column descriptions may be declared at a separate place, not part of the file itself.

Some segmenters like LibXML (does not exist, but may in the future) may have options to explicitely switch off some behaviour required by the standard but not desired because of vulnerabilities. For example attacks by recursive entity declarations.

I would suggest a structure SegmenterOptions containing an array of name value pairs passed to SegmenterInterface::createInstance(). The options available are dependent on the segmenter implementation.

patrickfrey commented 8 years ago

Options added in version 0.10.0