Report carve - Githubissues

The addition of CarveReport looks clean enough (will have a second look to verify).

However the change from _extract to _carve directory suffix is backward incompatible, and while generally I like the idea (I have opened the linked issue it solves), I think it would be better for the default for ExtractionConfig.carve_suffix to be _extract (=no change to current output), and make it a command line option to override it.

I had about the same idea of simplifying the double step extraction (carve then extract chunks). I thought, if a file is not fully recognized by any handler (=has multiple chunks), it should be categorized as "unknown" (or "composite") and handled by a "default" handler, which would recognize and extract (carve) chunks, and could also assign handlers to them. This would be exactly what you wanted - one task for each file. I went so far as to make an experimental refactor to work like this 2 years ago, but it become too big of a change with untidy commits to review, and also probably with some bad decisions and thus was abandoned without much consideration. One of the problems to solve is how to pass the handler between processes to avoid the duplication of the expensive handler selection. With the current solution it is not needed: handler selection and extraction happens in the same process.

I do think understanding and reasoning about a flattened extraction process would be easier, but it would be a big work to rewrite the code now.

Could you explain why you would like to handle chunk-files by separate (sub-)tasks? Maybe we can come up with a solution for that problem.

onekey-sec / unblob

Report carve #891