niryariv / opentaba-server

BSD 3-Clause "New" or "Revised" License
4 stars 13 forks source link

New scraper #63

Closed florpor closed 10 years ago

florpor commented 10 years ago

Scraper revised for the new site, please review the code. It is already working on https://dashboard.heroku.com/apps/opentaba-dev-server/resources you can also check out the mongo contents from there.

Also - I think if this is published it is time to drop the current data. Running this revised scraper got me just below 15000 plans and only one is blacklisted, while the current mongo has over 150000 plans and a ton are blacklisted. I assume a lot of the extra plans are ones that appeared on many gushim and are duplicated a lot throughout our data... If anyone just wants to test the new data, just point your local opentaba-client to https://opentaba-dev-server.herokuapp.com/ as the API url.

And there's a small change to opentaba-client that accompanies this one (the plan details url): https://github.com/niryariv/opentaba-client/pull/53

niryariv commented 10 years ago

works great here - testing locally by modifying the API_URL in the client app.js to

var API_URL = 'https://opentaba-dev-server.herokuapp.com/';

The MMI links seems to work fine even without the client patch.

Lots of new plans - looks excellent

florpor commented 10 years ago

Yes the old link does still works but it links to the old site and we can't know if it's gonna stay up or if it will show information about new plans not available with the old search.

niryariv commented 10 years ago

cool. @shevron @alonisser - could you also QA this on your machines?

alonisser commented 10 years ago

looks great in my machine also..

Did you check out the new front end build processs?

Twitter:@alonisser https://twitter.com/alonisser LinkedIn Profile http://www.linkedin.com/in/alonisser Facebook https://www.facebook.com/alonisser _Tech blog:_4p-tech.co.il/blog _Personal Blog:_degeladom.wordpress.com Tel:972-54-6734469

On Thu, Jan 23, 2014 at 5:37 PM, Nir Yariv notifications@github.com wrote:

cool. @shevron https://github.com/shevron @alonisserhttps://github.com/alonisser- could you also QA this on your machines?

— Reply to this email directly or view it on GitHubhttps://github.com/niryariv/opentaba-server/pull/63#issuecomment-33134168 .

shevron commented 10 years ago

Looks very good here. Also checked https://opentaba-dev-server.herokuapp.com/gushim.json?detailed=true which seems to produce nice data.

On Thu, Jan 23, 2014 at 6:39 PM, Alonisser notifications@github.com wrote:

looks great in my machine also..

Did you check out the new front end build processs?

Twitter:@alonisser https://twitter.com/alonisser LinkedIn Profile http://www.linkedin.com/in/alonisser Facebook https://www.facebook.com/alonisser _Tech blog:_4p-tech.co.il/blog _Personal Blog:_degeladom.wordpress.com Tel:972-54-6734469

On Thu, Jan 23, 2014 at 5:37 PM, Nir Yariv notifications@github.com wrote:

cool. @shevron https://github.com/shevron @alonisser< https://github.com/alonisser>- could you also QA this on your machines?

— Reply to this email directly or view it on GitHub< https://github.com/niryariv/opentaba-server/pull/63#issuecomment-33134168>

.

— Reply to this email directly or view it on GitHubhttps://github.com/niryariv/opentaba-server/pull/63#issuecomment-33141616 .