metacpan / MetaCPAN-Client

Home of the official MetaCPAN Perl API client.
21 stars 23 forks source link

Example on getting # favourites for all dists #27

Closed neilb closed 8 years ago

neilb commented 9 years ago

I can't work out how to query favourites. Can you add an example showing how you can iterate over all dists that have one or more favourites please, getting back just the dist name?

Cheers, Neil

mickeyn commented 9 years ago

@neilbowers++ your excellent question helped me find some missing support in the Client. (real user problems FTW)

So, first I just released it with the following support:

  1. type 'favorites' in 'all' (== match_all) queries
  2. 'facets' keyword allowed in the params

Now, theoretically you should be able to do this query using these two features.

I managed to get close with this code: http://pastebin.com/3NB8xHud and read the results from $favs->facets->{distribution}{terms}.

But I ran into the following problems with it (need some advice here on how to do it properly, @clintongormley? @oalders?):

  1. I had to perform 2 queries to determine the size, as I can't find a way to iterate over the facets results through the scroller object (don't think it's supported and I read 'facets' is a deprecated feature anyway in ES)
  2. The number of results are limited because of a server limit (only got ~5K out of ~23K)
  3. I couldn't find (yet) a way to filter on the count per dist in the facets results from the ES (rather than my plan to do it in Perl)... that might have pseudo-solved (2)

Will appreciate some help in solving this guys. Cheers, Mickey

clintongormley commented 9 years ago

@mickeyn Are dists labelled with whether they have favourites or not? Or are favourites completely separate docs? If, as I suspect, it is the latter, then what you are doing is probably about as good as it gets, but it is a really inefficient way of doing it.

instead, we should look at changing the data model to support these types of queries. Possibly parent-child would be a good solution here.

mickeyn commented 9 years ago

@clintongormley thanks for the quick reply.

I can't find a reference to the favorite in either http://api.metacpan.org/v0/distribution/_mapping or http://api.metacpan.org/v0/release/_mapping

So I guess you suspected correctly.

I guess we'll have to get @oalders's opinion on your suggestion.

neilb commented 9 years ago

If you're open to changes to the data model, then how about putting the number of favourites on the distribution, since that would make it much easier to get at.

In for a penny ...

mickeyn commented 9 years ago

I agree with @neilbowers that having the favorites count on the distributions will be helpful.

At the same time we remain with the same issue for other aggregations we may need. (or not, if I'm just missing the right way to do it)

oalders commented 9 years ago

Favourites have their own endpoint: http://api.metacpan.org/v0/favorite/_mapping There's an argument for moving these over to distribution, I think. The issues would be API versioning and also backups, since this is user metadata which cannot be recovered from a re-index. So, it's a non-trivial task, especially since it requires moving the API away from v0. We have to do this at some point anyway, but we don't have a proper plan for it yet.

mickeyn commented 9 years ago

@oalders I don't think we wanted to move the favorites, just to add the total count ... if that makes sense and/or simplifies the solution.

mickeyn commented 9 years ago

@neilbowers OK, so I got one important thing wrong - the facets do allow me to get all results, it's just that without the 'all_term => "true"' filter I didn't get the ones with count zero (so no server limit thing) which is what you asked for.

so you can reuse the code snippet above for your answer (as inefficient as it may be :)), or use the full example I just commited: https://github.com/CPAN-API/metacpan-client/blob/master/examples/all_favorite_counts_per_dist.pl

Cheers, Mickey

mickeyn commented 9 years ago

@neilbowers a simpler (one less query) example to fetch the top 20 ones: https://github.com/CPAN-API/metacpan-client/blob/master/examples/top20_favorite_distributions.pl

neilb commented 9 years ago

@mickeyn thank you -- I'll have a go with this either this evening or tomorrow.

Neil

mickeyn commented 8 years ago

closing (please reopen if needed)

oalders commented 5 years ago

For reference, this can now be done via https://fastapi.metacpan.org/v1/favorite/agg_by_distributions?distribution=Moose