tests: Tests with different file formats

test documents and expected results are in test_files dir
integration tests for rust interface
integration tests for python interface
tested formats: pdf, docx, pptx, doc, odt, pptx, csv, png, xlsx, epub

Issue-ID: 2

The test files were taken from an unstructured repository, and the expected result files were also generated by the unstructured library. Hopefully their library works well with their test files.

I used cosine_similarity because Levenshtein takes about 20 seconds to process the similarity of the extracted PDF text.

yobix-ai / extractous

tests: Tests with different file formats #8