thegooglecodearchive / allforgood

Automatically exported from code.google.com/p/allforgood
0 stars 0 forks source link

automate data loading + feed dashboard for all except craigslist #60

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
craigslist is a weird crawler, and they claim to be working on a real feed,
so do automate everyone first, then if they don't deliver, put in the code to 
automate them.

Original issue reported on code.google.com by adam.sah on 22 Apr 2009 at 1:22

GoogleCodeExporter commented 9 years ago
stats by provider:
 - number of items currently (histogram over time?)
 - impressions & clicks to their listings (histogram over time?)
 - % of listings with descriptions over N chars, for various N
 - heatmap of listings by geo

Original comment by adam.sah on 23 Apr 2009 at 8:19

GoogleCodeExporter commented 9 years ago
sub-issue: idealist is very slow, needs the regexp-parsing hack.  The others 
are 
working fine (modulo craigslist).

Original comment by adam.sah on 25 Apr 2009 at 12:27

GoogleCodeExporter commented 9 years ago
in terms of dashboarding, the big pain is that Base doesn't let you query it 
for the 
# of records by provider (yes, I tried the providername restrict, and the 
approximates were way off, typical of search engines).

some options:
1) use the proprietary Base data API to fetch this info.  This ties us more to 
Base.
2) create the dashboard in the datahub, i.e. include a little webserver.  Proxy 
requests through the appengine app (urlfetch) to hide the current datahub.
3) after the datahub loads into base, it then hits the appengine app, which 
writes a 
record to the datastore.  This seems overly complicated.

For now I'm going with #2.

Original comment by adam.sah on 26 Apr 2009 at 8:26

GoogleCodeExporter commented 9 years ago
automated loading starting to work-- setup to load every 4 hours.  (again, CL 
coming)
next up: dashboard

Original comment by adam.sah on 27 Apr 2009 at 1:07

GoogleCodeExporter commented 9 years ago

Original comment by adam.sah on 28 Apr 2009 at 4:13

GoogleCodeExporter commented 9 years ago
Since the loader has proven stable, I'm going to declare this 'fixed' for now-- 
and 
will file some featreqs against the dashboard.

Original comment by adam.sah on 30 Apr 2009 at 1:28