metacpan / metacpan-api

A free, open API for everything you want to know about CPAN
http://www.metacpan.org/
Other
291 stars 195 forks source link

Add recommendation support #253

Open yanick opened 11 years ago

yanick commented 11 years ago

Issue placeholder for discussions related to http://babyl.dyndns.org/techblog/entry/metacpan-recommendations.

tobyink commented 11 years ago

I think there's a lot more value to recommendations if people can justify the recommendation.

To avoid it becoming yet another reviews site though, perhaps instead of putting a note by recommendations, allow people to tag their recommendation with pre-defined categories, such as: "more features", "better API", "tiny", "in core", etc.

For example, on the LWP::UserAgent page I might want to recommend both WWW::Mechanize ("more features") and also HTTP::Tiny ("in core", "tiny").

tobyink commented 11 years ago

Also, a category which would need to be displayed separately: recommend modules which aren't an alternative to the current module, but are good partners for it. For example on the List::Util page, recommend List::MoreUtils.

monken commented 11 years ago

Let's keep it simple for the first release. We can still over-engineer later on :)

monken commented 11 years ago

@yanick here is my take on how to implement this: I like the idea of bundling the recommendation with a ++. In fact, we could extend the /favorite endpoint to include the inferior modules in the favorite document.

Example: user A recommends Moo over Moose and Mo, the resulting favorite entry would be:

{
  user: 1,
  distribution: "Moo",
  instead_of: ["Moose", "Mo"]
}

Pretty easy to implement and easy to query, too. Another thing to keep in mind. In my opinion we should use distribution instead of module names. When people look at Moose::Role, they still want to see the recommendation for Moo, although the user didn't explicitly recommend Moo::Role over Moose::Role.

oalders commented 11 years ago

My vote is for simple to start with as well. I think we should also take into account the ideas presented by @timbunce here http://blog.timbunce.org/2013/03/10/suggested-alternatives-as-a-metacpan-feature/. I would say for this thread we could limit discussion of Tim's blog post to the points where it overlaps with what @yanick has proposed. We really need a road map moving forward here. My inclination would be to break it up into small, simple, deployable chunks with a view to expanding the functionality down the road if it turns out to be as useful as we think/hope it will be.

I like the UI which Tim has proposed for the recommendations and I tend to agree that modules make more sense than distributions for the suggested alternatives. However, having ++ refer to dists and alternatives referring to modules could lead to some confusion. At the very least, I think it's a conversation worth having and maybe getting some wider input on.

Seeing the way @mo has laid out the ++ entry, looks really clean and easy to use, but what if I want to recommend Mojo::UserAgent as an alternative for LWP::UserAgent? Now we're talking about

{
  user: 1,
  distribution: "Mojolicious",
  instead_of: ["libwww-perl"]
}
yanick commented 11 years ago

[incorporate Tim's ideas here as well] Absolutely.

[road map] I agree that many small steps is the way to go. The sooner we have a core to play with, the sooner we can have a snowball effect and hundred, nay, thousand, of hackers pouring over the feature and submitting patches. ... okay, I might be harbouring too much hope here, but you know what I mean. ;-)

[modules versus distributions] I see the point for modules. For most modules/distributions, it won't make a lot of difference as, typically, one dist == one functionality == one main module. What I see playing again a per-module recommendation is that there will be dilution/confusion: anything that recommend Moose::Util, Moose::Meta::Class, Moose::Role really boils down to recommending Moose. Now, it's true that the flip side is that for distributions that are an umbrella for many functionalities (@oalders's example of Mojolicious is a good one), the recommendation might look odd, but I think that's still better than having a more diffuse module selection.

timbunce commented 11 years ago

[road map] I agree that many small steps is the way to go, but picking the right direction is important! :)

[modules versus distributions] Firstly, I would argue that the current placement of the ++ is misleading. Rather than:

Gisle Aas / libwww-perl-6.04 / LWP::UserAgent [47 ++]

I'd suggest:

Gisle Aas / libwww-perl-6.04 [47 ++] – LWP::UserAgent

To take your example @yanick, it's unlikely that anyone would recommend Moose::Util, Moose::Meta::Class, or Moose::Role specifically unless they had good reason. They'd simply refer to Moose instead. If they did have a good reason then they wouldn't be able to express it clearly if they had to do so at the distro level.

Also, consider the case of a large distribution with many modules (Moose, DBIx::Class etc) where someone has developed a separate distribution that contains a single module that's improves on the functionality of just one of the bundled modules. Clearly that distro isn't a "suggested alternative" for the original distro, but the module is a "suggested alternative" for a specific module in that distro.

(I can see an argument for calling the new distro a "complementary distro". So if you choose to implement at the distro level then the implementation should support different relationship types from the start.)

[API] I'm nervous of having this functionality ride piggy-back on favourites, but I don't know the API well enough to know how valid that concern is. Clearly it's only appropriate if you choose to implement this at the distro level.

They'll need to be API support for the other side of the relationship as well, i.e. the "Suggested as the alternative to X other modules by Y people" and "Suggested as complementary with X other modules by Y people".

[Naming] Either "suggested alternative" and "suggested addition" or "recommended alternative" and "recommended addition". Umm, "alternative" seems clear but "addition" doesn't seem quite right; "extra" is a bit vague and "complementary" is a bit of a mouthful. I'll let you bike-shed that one :)

monken commented 11 years ago

[placement of ++] @timbunce Please open a separate ticket for that. :+1:

[module vs distro] libwww-perl and Mojo are the exception and don't really follow the idea of CPAN where each dist tackles a certain problem or functionality. If we do it on a dist level that will also motivate people to split up their large dists. One might argue that the perl dist has many modules that are candidates for recommendations. My argument against that is that most of these modules are dual-lived and have their own dist that we would recommend instead of the perl dist. Again, let's keep it simple and I feel like having the recommendation on a module level would cause us a lot of headache.

Worst case scenario: User looks at LWP::UserAgent and recommends Mojo::UserAgent, will result in

{
  user: 1,
  distribution: "Mojolicious",
  instead_of: ["libwww-perl"]
}

Looking at any of the libwww-perl modules will result in showing a recommendation for Mojolicious (instead of Mojo::UserAgent). I'm totally fine with that. But others might disagree.

[API] Both queries you suggested should be supported. I like putting it in the favorite table because it relates and makes the implementation easier (in my mind).

[Naming] I vote for recommended alternative, it's quite a mouthful for the API key so I still vouch for instead_of

dagolden commented 11 years ago

Please, please, please, use module names. They are stable and reliable. Dist names are not. Dist names aren't even unique unless paired with uploader name (AUTHOR/Foo-Bar-1.23.tar.gz), so how do you track recommendations as different maintainers release. You're going to be in a heuristic pickle. (You might be there already with "previous dist".)

Modules are also precise. Forget the Mojolicious example, how about Scalar::Util and List::Util? Different alternatives will apply to each.

You can always roll it up on a distribution page and show other suggested distributions based on modules contained.

dagolden commented 11 years ago

[naming]

I suggest using see also -- this is softer than "recommended alternative" and could eventually be enhanced with comments or tagging. It would allow for "recommendation" or "alternative" or "for use with" semantics.

It solves the discoverability problem without making direct value judgments. And it's general enough that you can use it to weight search rankings.

yanick commented 11 years ago

[naming]

see also is, imho, too soft. The razor edge's we are walking, I think, is to have something that won't degenerate in bloodbaths, but still provide a venue to recommend solution X over solution Y.

[modules versus distributions]

Question for MetaCPAN peeps: assuming that we go with module names, is it easy or costly to have aggregation of those results done for the distribution? I'm asking because I think that we have to show some form of results at the dist level (I do not want to click through the many modules of DBIx::Class to know what peeps recommend for DBIx::Class as a whole). If per-dist aggregations of the module results is costly, then that would be a strong argument against the per-module recommendations. If not... then the fight can go on. :-)

oalders commented 11 years ago

[naming]

I had also thought "see also" would be a nice, succinct way of naming this. The argument I had against it is that "See Also" seems to be a fairly common header in module documentation and the meaning given there is probably wider in scope. It seems to be, "if you like this, then you might just want to look at these", with no implied judgement. However, I could live with our version of "See Also" having a narrower definition.

dagolden commented 11 years ago

[naming]

I think it's good to start soft and general, because you can always make it harder with more specific annotations or tagging. On the other hand, if you try to create the right ontology first and get it wrong, then you're sort of locked in.

Go back to @tobyink 's comment about comments -- I think starting with soft will allow greater insight into how it's being used, then more rigor can be figured out on the basis of actual usage.

timbunce commented 11 years ago

All @dagolden's points are strong ones and I agree with them.

Re @yanick's query on cost of calculating the distro level recommendations, that could be done async as a batch job. As mentioned before, most recommendations will be made to the 'root' module of a distro, and the module level recommendations on that page would be updated immediately. I don't see a problem with the distro page not getting the change till later.

monken commented 11 years ago

@dagolden when I talk about distributions I'm talking about Foo-Bar and not AUTHOR/Foo-Bar-1.23.tar.gz, which is a release in my mind. So the issue to track releases of different authors doesn't really apply.

Someone might recommend List::MoreUtils over List::Util. There is no harm if that also shows up for Scalar::Util. It's one distribution, it's one bucket of modules that try to solve a common issue: provide utilities to Perl data structures.

@timbunce

most recommendations will be made to the 'root' module of a distro

I agree and that's why we could just stick with recommending distributions because in all those cases, the distribution matches the module name. I guess I still have trouble understanding why we should recommend based on module names when we collapse on a dist-level anyway.

dagolden commented 11 years ago

@monken "distributions" in the way you describe it don't exist as far as PAUSE is concerned. They don't exist as far as users are concerned because they can't be installed. They are a fiction invented by search.cpan.org and mimicked by metacpan.org. Perpetuating that design mistake would be unfortunate.

shadowcat-mst commented 11 years ago

"recommendations" is wrong. For a start, now what do you call the relationshuip going the other way?

SEE ALSO is exactly the CPAN tradition for this since it doesn't mean "if you like this module", it means "if you're looking at this module you should also look at".

Generally in a deprecation situation, optimally you'd create the relationship going both ways.

Plus many recommendations would be conditional on some factor - there's no one universal best practice or we wouldn't be having this discussion, we'd just be picking one best module for each task and moving on.

As an example - I might add a link on DBIx::Class saying "see also DBIx::Lite if you don't need objects" and they might add a link back for the 'do need objects' case. Those are both conditional recommendations.

We can't assume a concept of 'obsoletes' and 'obsoleted by' - the Mojo/LWP example is good there, since while the Mojo API is a lot nicer for a lot of cases sri told me he doesn't want it to become 'the' HTTP API because he doesn't want to take on the backcompat requirements, so it's not a straight replacement even at the user agent level.

Another example would be PAR and App::FatPacker. fatpacker is way nicer to deal with than PAR for the cases it supports ... because by refusing to handle XS I managed to avoid 90% of the complications. So I'd like to think that it obsoletes PAR for most pure perl packing, but I still recommend PAR when you need XS support.

So I think calling it 'module relationships', displaying it as 'see also', and letting people put 'obsoletes' or 'obsoleted by' in the tags is probably the sensible way forwards. We can't capture a lot of the useful information otherwise, and it leaves us providing mechanism and then seeing what the userts shake out in tertms of policy

monken commented 11 years ago

You rate distributions, you file bugs against distributions, cpan testers is organized by distributions. I think that term and fiction is well established in the Perl community and ecosystem. I understand that PAUSE follows a different approach, but I don't think it's practical to think in terms of modules for many use cases.

shadowcat-mst commented 11 years ago

@monken CPAN works the way dagolden describes, not the way you describe. rt.cpan.org creates a queue based on the name of the first tarball to contain a new module, then uses that module's permissions to determine the maintainer, and the result is that bugs have to be re-opened when modules are split out. Not actually a feature, just a historical thing.

A see also attached to Sub::Quote pointing to Eval::Closure should not stay with Moo if I split the module out. A see also on Path::Router pointing to Web::Dispatch should not stay pointing at Web::Simple if I split the module out.

Making links to mojolicious would be completely futile if it was dist level only, too.

So for this use case it evidently isn't practical to think in terms of distributions alone. So the remaining question is whether we initially support only modules, or whether we need distributions as well. Can you provide a concrete example of a case where distributions work and modules don't?

dagolden commented 11 years ago

I'm actually curious what happens on RT if there's an identically named distribution. If I were more evil, I would upload Moose-3.000.tar.gz containing a legal, unindexed module (NotReallyMoose.pm) and see what blows up as a result.

Since Metabase started, internally, all reports are full AUTHOR/DIST-VERSION.SUFFIX. It's only the display stuff that hasn't been updated.

Regardless of that, I think @shadowcat-mst makes the stronger case -- as modules move between distribution, recommendations/see-also should follow them.

I can't make you rewind the clock and stop having metacpan.org stop using "distribution" the way you are. But I do encourage you not to hang any more stuff off a non-unique key.

dagolden commented 11 years ago

[naming]

How about Related Modules? Not the "See Also" we're used to seeing; establishes that there is a relationship; but is generic to support future distinctions.

oalders commented 11 years ago

[naming]

I like "Related Modules", but I also do think "See Also" is the most succinct, even if I have some reservations about it.

[modules versus distributions]

I feel like at this point we've settled on modules and can move on from this. We an always add a dist recommendation system if needed, but I can't see a use case for that just yet. Correct me if I'm wrong.

tobyink commented 11 years ago

As per http://blogs.perl.org/users/neilb/2013/03/whats-wrong-with-cpan.html#comment-405091 it would be nice if an author's own recommendations could be handled specially.

OK, so authors already get to put whatever recommendations they like in the pod, but the recommendation system would be machine-queryable.

neilb commented 11 years ago

I think there are some overlapping concepts that are possibly getting mixed up here, including at least:

I think the first type could be solved by tagging: Const::Fast might be tagged with "constants", and "immutable variables". These could be displayed next to every module (that has them), and clicking on them would list all modules in that group (ie tagged with that tag). So if someone knows they're thinking about immutable variables, they'll click on that, and get a shorter list.

I think the two concepts can be tied together by making the alternate modules model be "I think module::A is better/worse/equivalent than/to module::B [for tag]".

yanick commented 11 years ago

Just a quick word to say that I'm chugging along with the UI part at https://github.com/yanick/metacpan-web/tree/recommendations I have stubbed in a MetaCPAN::Web::Controller::Account::Recommend, and I should be in a position to hook to the ElasticSearch backend as soon as I have one hour or two more to sink in this project. So... if any of you metacpaners feel like carving me a rest uri for that, that might come in handy rrrrreal soon. :-)

yanick commented 11 years ago

Because I'm obviously bonker, I began to look at the cpan-api side of things. Result: https://github.com/yanick/cpan-api/tree/recommendations I have absolutely no idea what I'm doing... but I have tests that are passing for the creation and removal of recommendations.

Anyway, that part is far from being done, but I just wanted to give everybody a fair warning that the fox is in the henhouse. It's not too late to grab a shovel and come give it a good whack before it does too much damage. :-)

ranguard commented 11 years ago

@yanick I don't know the metacpan code enough to do a review - just wanted to say keep it up even if you are damaging the henhouse - I'm sure someone will help patch it up after :)

yanick commented 11 years ago

@ranguard That's the plan. :-) If nothing else, it gives me an excuse to learn ElasticSearch, which I wanted to do for some time now.

I'll push my latest code in a few instants. But it seems that I can push changes to the db just fine (yay!). Now remains the more thorny question of how ES does its searching.

yanick commented 11 years ago

I... I think I have a working prototype. https://github.com/yanick/metacpan-web/tree/recommendations and https://github.com/yanick/cpan-api/tree/recommendations

In the database, I have Recommendation documents that have a user / module / alternative triplet, which can be pushed via

/recommendation/[user]/[module]/alternative/[better module]

In metacpan-web, the lesser and better alternatives to the current module are gathered in, respectively, 'instead_of' and 'supplanted_by'.

And that's pretty much it. Oh, and I put in the restriction that a user can only give one alternative for each module (to keep things simple).

yanick commented 11 years ago

I think that I went as far as I could go. For the next step, I'd need somebody from the MetaCPAN team to look at what I did, and provide feedback for the, uh, well-meant atrocities I did to the model. Not to mention that I also need feedback on the UI: placement / nomenclature / etc.

oalders commented 11 years ago

Sounds good. We'll get you some feedback. :)

thaljef commented 11 years ago

[distributions versus modules]

Working on Pinto, I've had to think about modules and distributions a lot, so I'll toss in my 2 cents. My own conclusion thus far, is that modules are the only real thing. That's what PAUSE indexes, that's what you use in code, and that's what you put in the prerequisites. There is no such thing as a distribution. At most, there are only archives (tarballs) which is just a bag of modules at specific versions. So from that view point, any recommendation system probably ought to work at the module level.

thaljef commented 11 years ago

Something else just occurred to me (and I think you all might have had a similar thought):

One way to generalize this might be to create various types of "associations" between modules. An example of an association might be "similar to" or "extended by" or "plugin for" or "superseded with". Some of those associations could be bi-directional, and some might only be uni-directional. Some pairs of associations could even be reciprocal. For example, a "slower than" association in one direction creates a "faster than" association in the other direction.

Any MetaCPAN user could create a link between any two modules, using one of the predefined association types. Then users could up (or down) vote on the associations. For each module, MetaCPAN tracks the vote counts and displays the most favored module for each type of association.

In other words, there are several axes of discovery within MetaCPAN. So you define some of those axes (i.e. associations), let users nominate the endpoints (i.e. modules) and let them assign weight to each candidate (i.e. voting).

I have no idea how the UI would work out for this, but you get the idea. Thanks for listening.

monken commented 11 years ago

@yanick great work! Could you please make this a branch in the CPAN-API org? I think that's some solid ground work but needs some fine tuning :)

yanick commented 11 years ago

On 13-04-11 06:18 PM, Moritz Onken wrote:

Could you please make this a branch in the CPAN-API org?

If you are crazy enough to give me a commit bit for the CPAN-API org, 

I'd be delighted. :-)

I think that's some solid ground work but needs some fine tuning :)

Amen to that. The goal was much less to come with a perfect solution 

than to put a working prototype out there to ground the discussion into something concrete. Now, at least, we'll be able to throw patches at each others. ;-)

yanick commented 11 years ago

On 13-04-10 03:39 AM, Jeffrey Ryan Thalhammer wrote:

One way to generalize this might be to create various types of "associations" between modules. [..] I have no idea how the UI would work out for this, but you get the idea.

Yup. And I think the UI would remains mostly the same. Just now the 

recommendation would come with a (or many) relation tags. I would also probably mean that one can select more than one relation tag between modules (e.g., module A is faster and has better documentation than module B).

I guess this model has one big decision point: are the association 

types pre-defined, or are they free-form and created by the users. The latter is more flexible, but would need to be curated before the site could decide which module to recommend over the other.

oalders commented 11 years ago

@yanick You now have full access to do terrible things to MetaCPAN. :)

yanick commented 11 years ago

On 13-04-11 08:17 PM, Olaf Alders wrote:

@yanick https://github.com/yanick You now have full access to do terrible things to MetaCPAN. :)

Thank you. I humbly accept this access token and, notwithstanding 

crazed gleaming eyes and high-pitched cackles, promise to use this new power for good. ;-)

Joy, `/anick

thaljef commented 11 years ago

I guess this model has one big decision point: are the association types pre-defined, or are they free-form and created by the users.

Maybe a bit of both.

The total number of useful association types is not that large. Maybe 10 or 20. More than that will probably be overwhelming to the user and difficult to present visually. So I don't think you want to leave it completely wide open. And personally, I would want to avoid people creating associations like "cooler" or "buggier" -- that kind of stuff is better left in full-text reviews where someone has to really own their words.

If you have bi-directional and reciprocal associations, then the system needs to know about those in advance so it can make the connection going the "other way" as well. For example, if you have predefined "faster than" and "slower than" associations, you probably don't want someone to invent a "quicker" association.

But you could still leave the door open for people to suggest new associations. Perhaps they would be excluded from the official score. But if you see patterns in the suggestions, then you'll have some clues about which associations should be part of your predefined set.

One caveat: this whole idea requires the voters to have experience with two modules. And perhaps that is the whole point -- to make relative comparisons. But the number of folks who have used both Mojolicious and Dancer (for example) is certainly less than the number of people who have used only one of them. So a simple tagging system might actually get more user input, albeit less specific and noisier.

yanick commented 11 years ago

'recommendation' branches now under CPAN-API:

rwstauner commented 11 years ago

welcome aboard, @yanick! thanks for helping out! :-)

yanick commented 11 years ago

cough cough

Poke?

oalders commented 11 years ago

Poke appreciated. I will review this over the weekend. :+1:

timbunce commented 10 years ago

Poke! QA Hackathon?

oalders commented 10 years ago

It's officially on the list. I've got a few things to work through today. I imagine I'll look at this tomorrow. :)

oalders commented 10 years ago

I've just rebased both of the branches -- I forked metacpan-web because the rebase was a bit hairier. This is a wip.

timbunce commented 10 years ago

Closed without comment. Odd. What's the status?

ranguard commented 10 years ago

This needs a champion to actually work on it - was just going to point the next person that mentioned it here :)

timbunce commented 10 years ago

Wouldn't it be better to label the issue 'Volunteer needed' rather than closing it?

ranguard commented 10 years ago

After further discussion... yes... (though I've never seen someone actually take on 'Volunteer needed', so we have renamed - 'Champion required'...)