patrickfrey / strusUtilities

A set of command line programs to access the strus information retrieval engine
http://www.project-strus.net
Mozilla Public License 2.0
3 stars 0 forks source link

Standard document type detection fails on TSV files with big elements #66

Open patrickfrey opened 6 years ago

patrickfrey commented 6 years ago

Programs doing document type detection strusAnalyze, strusInsert, strusCheckInsert, strusGenerateKeyMap, strusSegment fail to detect TSV files if the first two lines of the file (header + first data line) are not fitting into 4K.

The reason is that these programs use only the first 4K of the document to detect the document type.

Possible fix: Retry with a bigger size, if the document type detection fails. The standard document type detection must also be fixed. It currently returns "text/plain" in this case.