metacpan / js-metacpan-org

Proof of concept search page for using api.metacpan.org
28 stars 6 forks source link

Integrate CPAN testers report through their JSON interface? #27

Open ranguard opened 13 years ago

ranguard commented 13 years ago

http://www.cpantesters.org/distro/D/Data-Pageset.json for example

oalders commented 13 years ago

http://search.metacpan.org/#/showpod/CPAN::Testers::Reports::Query::JSON

ranguard commented 13 years ago

Thinking about this more - it would be nicer to have this via the API - but maybe that's what you mean't :)

oalders commented 13 years ago

Yes, that would actually be really helpful. There is this cryptic issue already open, but there's no real plan around it:

https://github.com/CPAN-API/cpan-api/issues/#issue/38

Getting all of the tester data in there might be a massive job and there's the question of how often to update, but I think it's very much worth looking at -- even if it's only summary data in the API. MetaCPAN could be one API to rule them all. Would be nice if you could get X different kinds of data without having to learn X different APIs, feeds etc. :)

barbie commented 13 years ago

It depends what summary information you want, but I already store a summary in the DB to make loading the web pages a little faster. The only downside is that it isn't necessarily up to date, as the builder is currently 36 hours behind the oldest outstanding report submission. I could make this available via a dedicated request though.

oalders commented 13 years ago

I certainly can live with a 36 hour lag for sure and a dedicated request would be splendid. :) Would be great to get this information into the index. I can see that it would be very useful to a lot of people.

barbie commented 13 years ago

The summary is stored as JSON, so it should be simple enough to query an author or a distro and get the block of JSON return quite quickly. I'll sort a prototype out next week for you. If the summary needs anything additional for you, I can look at sorting that out too.

oalders commented 13 years ago

That sounds very good. Looking forward to it!

monken commented 13 years ago

There used to be an IRC channel where each test result was propagated. With CPAN Testers 2.0 this has been removed due to too much traffic, I guess. This kind of stream would have been ideal for our purposes. We could have joined the IRC channel with an bot and updated the test count (failing/pass/na) in real time.

Are there any plans to add something like this to the current CPAN Testers API? E.g. long-polling http requests or something. I'd be happy to help with that!

barbie commented 13 years ago

There are about 5,000 reports a day usually, so a stream would be difficult to manage. There is a tail log of the submissions, but it is probably more appropriate to lookat specific distros and authors that you are interested in.

Sorry haven't finished the summary API yet. My vodafone dongle broke this week, so can't work on the server while on the bus at the moment :( Will try and get something up and running tomorrow while watching the Grand Prix ;)

monken commented 13 years ago

Hi!

I was planning to use the test results for the distribution ranking algorithm (i.e. rank releases with bad results lower). And for this to work properly I need all results in the ElasticSearch instance. I'm not sure where search.cpan.org does get the data from. But they seem to be pretty up2date (some hours of delay).

Real-time updates would be perfect, but I do see the technical challenge on the cpan testers side.

barbie commented 13 years ago

search.cpan.org gets them from the cpanstats SQLite database available from the development site: http://devel.cpantesters.org. The DB is updated every 6 hours, and typically search.cpan.org takes a copy once a day.

monken commented 13 years ago

Cool thanks! That should be enough for our purposes!

Am 11.04.2011 um 23:36 schrieb barbie:

search.cpan.org gets them from the cpanstats SQLite database available from the development site: http://devel.cpantesters.org. The DB is updated every 6 hours, and typically search.cpan.org takes a copy once a day.

Reply to this email directly or view it on GitHub: https://github.com/CPAN-API/search-metacpan-org/issues/27#comment_986265

oalders commented 13 years ago

Barbie, does the SQLite database get us to the same place? If the JSON feed has more interesting info, I don't want to exclude that possibility. Having said that, I also don't want to create extra work for you. :) What do you think?

barbie commented 13 years ago

Yes, although it means slightly more work your side. I have the JSON summary for the author working, but the distro and totals (as requested by Ranguard) summaries didn't contain enough info, so I'm having to rework the Generator to establish and update that info first. The JSON will still be useful for snapshots, but the SQLite route is probably better if you're expecting to process 100s of queries a minute.