pinecone-io / pinecone-text

Pinecone text client library
Other
54 stars 32 forks source link

[Bug] Not working with nltk #81

Open guidev opened 3 months ago

guidev commented 3 months ago

Is this a new bug?

Current Behavior

nltk.download("punkt") fails for nltk v 3.9.x

@staticmethod
  def nltk_setup() -> None:
      try:
          nltk.data.find("tokenizers/punkt")
      except LookupError:
          nltk.download("punkt")

      try:
          nltk.data.find("corpora/stopwords")
      except LookupError:
          nltk.download("stopwords")

Here's a full explanation https://github.com/nltk/nltk/issues/3293

Expected Behavior

pinecone-text should work with the latest nltk version

Steps To Reproduce

https://github.com/nltk/nltk/issues/3293

Relevant log output

No response

Environment

- **OS**:
- **Language version**:
- **Pinecone client version**:

Additional Context

No response

adumont commented 3 months ago

Apparently it seems to be enough to just modify sparse\bm25_tokenizer.py, replacing punkt with punkt_tab.


    @staticmethod
    def nltk_setup() -> None:
        try:
            nltk.data.find("tokenizers/punkt_tab")
        except LookupError:
            nltk.download("punkt_tab")

        try:
            nltk.data.find("corpora/stopwords")
        except LookupError:
            nltk.download("stopwords")```
emielsteerneman commented 3 months ago

83