Closed petterreinholdtsen closed 4 years ago
Note, this code do not try to recognize and handle different character encodings. The best way to handle it is probably to use some heuristics when reading the file to guess encoding, assume some default setting (for example ISO-8859-1) if unable to detect the encoding, or at least one of the encodings listed in https://lovdata.no/dokument/SF/forskrift/2017-12-19-2286 .
Note, this code do not try to recognize and handle different character encodings. The best way to handle it is probably to use some heuristics when reading the file to guess encoding, assume some default setting (for example ISO-8859-1) if unable to detect the encoding, or at least one of the encodings listed in https://lovdata.no/dokument/SF/forskrift/2017-12-19-2286 .
Nice, there are some linux tools and libs for guessing encoding, could be added later.
Add new DAttachmentIndexer() helper class DText. It reads plain text files and add their context to the search index.