pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.58k stars 965 forks source link

Advanced search #727

Open ionelmc opened 9 years ago

ionelmc commented 9 years ago

Sorry if this has been asked before.

I would be really nice if I could make metadata searches like these on PyPI:

And so on ...

ionelmc commented 8 years ago

Not really. Use-case: give me all packages that do X but don't depend on Y, cause Y is broken/whatever.

I frequently look for packages that don't have heavy dependencies. Eg: I want a plotting library that don't depend on matplotlib.

To make the banana analogy: you asked for a banana, you got it, but there's a monkey and the whole forest attached to that banana.

alexwlchan commented 8 years ago

Another criteria that would be useful:

ionelmc commented 8 years ago

Another idea:

toddrjen commented 8 years ago

I think some of these would be easier if there was a boolean search that could be used to find packages that don't match a particular result. So rather than:

You could just have

And use the boolean "not" operator to exclude results matching that.

brainwane commented 6 years ago

A related issue about exclusion in search: #1971.

brainwane commented 6 years ago

@waseem18 is writing up a bit of a proposal on how to do this.

brainwane commented 6 years ago

@waseem18 it would be great to get to see your work in progress! Feel free to share it in a GitHub gist and link to it here, or put it right into a comment. It's fine if it's rough.

waseem18 commented 6 years ago

@brainwane I'll put up a comment about what and how and then start on after getting feedback.

waseem18 commented 6 years ago

Below is a rough UI screen on how Advanced Search might look like.

1

@brainwane @nlhkabu Will be happy to receive your feedback / suggestions on this.

pradyunsg commented 6 years ago

I have to point out that the information about dependencies of a package are not statically available for source distributions. Thus, this information is incompletely available right now. There's an open issue on this repository regarding the same.

IIRC, Warehouse stores the install_requires (i don't remember the name?) metadata for packages that upload a wheel first.

waseem18 commented 6 years ago

Thanks for the information @pradyunsg I was unaware of #474 and #2502 and I was looking into the JSON's of packages - Your comment put me on track now.

I've gone through #474 #2502 and found that as of now it's not trivial to implement Advance Searching.

And as mentioned on #474

it looks like out of ~120k packages in the PyPi index, only ~17k have a non null info->requires_dist field

Glad that PEP 566 has been accepted which paves way for having meta data for packages that upload a wheel first.

pradyunsg commented 6 years ago

Thanks for the information @pradyunsg

Glad to be of help. :)

brainwane commented 6 years ago

In today's Warehouse core developers' meeting we decided to pare down our near-future milestones on our development roadmap so they really only contain the essential bugfixes and features we need to launch, replace legacy PyPI, and shut down the old site.

So I'm moving this issue into a milestone further in the future; sorry for the wait. And I would love for @waseem18 to make further progress on it, if he would like to!

waseem18 commented 6 years ago

I would be happy to work on this @brainwane I'll keep a close look on the issues that this issue depends on so that we can start on once they are resolved.

Similar is the case for #1677

nlhkabu commented 6 years ago

hi @waseem18 thanks for your work on this so far.

A couple of UX ideas:

  1. I think it could be better to have the advanced search appear below the main search bar - something like this:

screenshot from 2018-03-10 11-07-41

  1. it would be awesome if we could develop some kind of advanced search syntax - similar to github: https://help.github.com/articles/searching-issues-and-pull-requests/

What do you think?

waseem18 commented 6 years ago

Thanks for the UX ideas @nlhkabu . The suggested UX looks really great.

I'll implement the UI in the same way as you suggested once work on Advance Search is started.

brainwane commented 6 years ago

https://github.com/pypa/warehouse/issues/3452#issuecomment-377096605 has a suggestion from @drunkwcodes:

Maybe introducing https://github.com/nepsilon/search-query-parser and letting users to type search queries like "Framework:Django" in the search bar will help.

Because we are familiar with Google search and Github search.


@HonzaKral I'd appreciate your assessment on what we need to configure or what components/extensions we need to add to our ElasticSearch setup to get more advanced search in Warehouse, if you have time to give your opinion!

honzakral commented 6 years ago

There are no additional components needed from the installation part, as long as all the fields you'd want to query exist on the documents. Then it's a matter of extracting those conditions in a structured way (either by parsing text input or by processing a more complex/broken down form), validating them (by providing a whitelist of options) and adding conditions to the search. Something like:

# get search object from current code
search = get_search()

# create Query objects from form data ...
for filter in parse_and_validate_filters(form_data):
    # and apply to search
    search = search.filter(filter)

To create a filter there would be somewhere (I'd assume a Form object of some sort) logic to convert the input to Query:

assert parse_input('version>=1') == Q('range', version={'gte': 1})
assert parse_input('version>=1,<3') == Q('range', version={'gte': 1, 'lt': 3})
assert parse_input('Framework:django') == Q('match', framework='django')

Alternatively you could also use FacetedSearch abstraction which is already part of elasticsearch-dsl (0) and that has the ability to use filters as well as calculate/display facets which is always a nice addition to search.

I would be happy to talk more and provide any help with the elasticsearch part of this

0 - http://elasticsearch-dsl.readthedocs.io/en/latest/faceted_search.html#example

brainwane commented 6 years ago

@HonzaKral If you're open to actually making the improvement in Warehouse yourself, that would be great! If not, I totally understand, and will ask @robhudson whether he has time. :)

honzakral commented 6 years ago

I would love to work on it, but not sure about the time. I will try to make time at PyCon sprints, will update the issue then. Thanks for the ping