snexus / llm-search

Querying local documents, powered by LLM
MIT License
481 stars 60 forks source link

DocumentSplitter type error #50

Closed ImVexed closed 1 year ago

ImVexed commented 1 year ago

https://github.com/snexus/llm-search/blob/main/src/llmsearch/parsers/splitter.py#L53 and many other places in the file seem to access extension like it's a string, but it's an object. Causing the splitter to find no files. This was causing the examples to not work for me.

snexus commented 1 year ago

Thanks for reporting. Can you elaborate please on the issue and your setup? It seems the conversion happens implicitly on my system, it also works on Google Colab which is a similar setup (Ubuntu 22.04)

ImVexed commented 1 year ago

I'm on windows 11, When logging logger.info(f"Scanning path for extension: {extension}") for example I would see Scanning path for extension: DocumentExtension.pdf and I assume it also caused list(docs_path.glob(f"**/*.{extension.value}")) to be list(docs_path.glob(f"**/*.DocumentExtension.pdf"))

Considering that DocumentExtension is an Enum, It seems to have a key and value when used as a explicit type.

ImVexed commented 1 year ago

And for me, Python 3.11's enum.__str__ is:

    def __str__(self):
        return "%s.%s" % (self.__class__.__name__, self._name_, )
snexus commented 1 year ago

Thanks for that! I understand now, looks the problem is in different behaviour of Enum on Python 3.10 (which was used to develop this package) vs 3.11. In 3.10 has an implicit conversion to string, which doesn’t happen in 3.11.

I will work on compatibility with 3.11. If you have an access to virtualenv with 3.10, it should work hopefully.

snexus commented 1 year ago

Fixed in version 0.3.2