The test files were taken from an unstructured repository, and the expected result files were also generated by the unstructured library. Hopefully their library works well with their test files.
I used cosine_similarity because Levenshtein takes about 20 seconds to process the similarity of the extracted PDF text.
Issue-ID: 2
The test files were taken from an unstructured repository, and the expected result files were also generated by the unstructured library. Hopefully their library works well with their test files.
I used cosine_similarity because Levenshtein takes about 20 seconds to process the similarity of the extracted PDF text.