simon987 / sist2

Lightning-fast file system indexer and search tool
GNU General Public License v3.0
843 stars 55 forks source link

Index/Scan: Duplicate Content field. #442

Open dpieski opened 9 months ago

dpieski commented 9 months ago

Device Information (please complete the following information):

Describe the bug

2023-12-06 16:37:33 [ERROR elastic.c] {
    "index":    {
        "_index":   "dcdocs",
        "_type":    "_doc",
        "_id":  "656fa65e.0000148e",
        "status":   400,
        "error":    {
            "type": "parse_exception",
            "reason":   "Failed to parse content to map",
            "caused_by":    {
                "type": "json_parse_exception",
                "reason":   "Duplicate field 'content'\n at [Source: (ByteArrayInputStream); line: 1, column: 156]"
            }
        }
    }
}

Steps To Reproduce Please be specific! I think this may be related to a file that has text in it, and OCR is run, which creates more text, possibly?

Expected behavior

Actual Behavior

Screenshots

Additional context

simon987 commented 9 months ago

Would you be able to find this 656fa65e.0000148e document and send it to me?

sqlite sist2-scan-xxx.sist2
SELECT path FROM document WHERE id=5262;
dpieski commented 9 months ago

That file is actually a VOB file. I think it is the menu-movie from a DVD.

f/u about this specific file on discord.