Open evekhm opened 3 weeks ago
Ok, this is a bit complicated because the Document AI Custom Splitter specifically detected those two "form1"
entries as separate documents.
If we combine them together by default, it could create ambiguity when there are multiple separate documents of the same type in a file.
We could create a parameter like combine_like_document_types
or something like that, but I think this issue would be best resolved on the Custom Splitter itself.
Here is entities example returned from splitter:
In this case we see that all pages are actually of same type and we should not split. However document.Document.split_pdf would not detect that.