yeraydiazdiaz / lunr.py

A Python implementation of Lunr.js ๐ŸŒ–
http://lunr.readthedocs.io
MIT License
188 stars 16 forks source link

Remove pin to nltk < 3.5 #94

Closed yeraydiazdiaz closed 3 years ago

yeraydiazdiaz commented 3 years ago

For clarity and prompted by https://github.com/yeraydiazdiaz/lunr.py/pull/93. NLTK was pinned to < 3.5 because it started requiring regex which uses a C extension.

This wouldn't be a problem by itself except that mkdocs has a hard dependency on lunr.py and before we pinned nltk users of mkdocs experienced installation issues on Alpine images. Mkdocs is obviously one of our main consumers so it was deemed less problematic to pin NLTK to < 3.5. The problem is, of course, that we will eventually need to upgrade as newer Python versions are released.

Luckily PEP-656 has been accepted providing a standard for Alpine wheels, and work updating pip to support it is in progress, but until regex picks it up and starts producing Alpine wheels we're pretty much stuck.

The real solution would be to instruct users of mkdocs not to use Alpine images or to install gcc as a previous step but this will inevitably create confusion on users of mkdocs, many of whom don't have Python experience, which translates to pressure on the mkdocs maintainers. So I'm leaning towards keeping the pin for now.

cc @waylan

waylan commented 3 years ago

Thanks for reaching out to me on this. I have actually been thinking about this issue a lot lately and was considering reaching out to you.

Requiring a C extension is a no-go for MkDocs, at least as a default. The attractiveness of lunr.py was that is provided a pure Python solution. If you can't offer that, then we have little interest in supporting it in MkDocs.

As a reminder, the only need we have for lunr.py is for the non-default option of prebuilding the search index. Most users don't need that option. And for those that do, the default is to call out to a Node process and run lunr.js. Using lunr.py requires the user to explicitly specify that. Originally, I expected that we would add support for lunr.py as an experimental option. Then, over time, as any bugs got worked out we would change over to making it the default. However, since we hit the dependency issues, I am not comfortable making that move until such time as the dependency issues are resolved.

Their may be some users of MkDocs who would be happy to deal with the C extension, and maybe even prefer it over installing Node. However, so long as the issue remains, we will not be making lunr.py the default. And if it is not going to become the default, then I think we should remove lunr.py from the list of MkDocs' dependencies. Perhaps we could make it a optional dependency. That way, those users who want to use it and don't mind dealing with a C extension can install and use lunr.py and everyone else doesn't need to even be bothered by it at all.

That said, I would much prefer for lunr.py to remove any dependency on nltk and then we would happily make it the default. We regularly get complaints about the large number of dependencies we have. Every time, the dependencies of lunr.py are at the top of the list. If it is not reasonable for lunr.py to remove those dependencies, then we will move to making the dependency on lunr.py optional.

yeraydiazdiaz commented 3 years ago

Thanks for reaching out to me on this. I have actually been thinking about this issue a lot lately and was considering reaching out to you.

No worries ๐Ÿ™‚

The attractiveness of lunr.py was that is provided a pure Python solution. If you can't offer that, then we have little interest in supporting it in MkDocs.

Lunr.py does offer a pure Python solution.. for English. Remember NLTK is an optional dependency for additional languages, i.e. MkDocs depends on lunr[languages], not lunr which is pure Python. The reason we use NLTK is because lunr.js has a separate community-driven package for language support, we didn't want to replicate in Python when there's a perfectly valid option in NLTK.

Most users don't need that option. And for those that do, the default is to call out to a Node process and run lunr.js.

Agreed, I doubt there's a high percentage of MkDocs users that leverage the lunr.py option.

We regularly get complaints about the large number of dependencies we have. Every time, the dependencies of lunr.py are at the top of the list. If it is not reasonable for lunr.py to remove those dependencies, then we will move to making the dependency on lunr.py optional.

I also agree. I think MkDocs has grown beyond the Python ecosystem to the point that every dependency carries a risk. Having it as an optional dependency makes the most sense to me, or even removing it completely and expanding the documentation. At least this way users know they're moving into "advanced usage" territory and some work on their part is required.

waylan commented 3 years ago

Lunr.py does offer a pure Python solution.. for English. Remember NLTK is an optional dependency for additional languages,

I wonder if perhaps you could make NLTK an optional dependency. Then English only users wouldn't need to deal with an unneeded dependency. Of course, you are in a better position to know if that makes sense for lunr.py.

In any event, I think we will move forward with making lunr.py an optional dependency of MkDocs. When doing so, we will remove the restrictions on lunr.py's dependencies. Advanced users who are willing to dive in can then deal with a C extension if need be.

yeraydiazdiaz commented 3 years ago

I wonder if perhaps you could make NLTK an optional dependency. Then English only users wouldn't need to deal with an unneeded dependency. Of course, you are in a better position to know if that makes sense for lunr.py.

It is already an optional dependency, to sum up my understanding of the situation: users that want to prebuild indices can either install NodeJS + lunr.js, or install lunr. Users that also want language support have to install lunr.languages or install lunr[languages].

The way I see it MkDocs needs to define how much "batteries" wants to include and its cost in dependencies. You could consider all prebuilding methods as external to MkDocs, or you can include lunr.py (English only) and document how to achieve language support on prebuilt indices.

Frankly, if I were maintaining MkDocs, I would remove lunr.py as a dependency entirely and treat the two branches as external to the project, if only for simplicity and consistency, users can then decide if they want to go the NodeJS route or the Python route. But, of course, it's your call ๐Ÿ™‚

waylan commented 3 years ago

Please see mkdocs/mkdocs#2402 where I am following up on this.