ninya-io / ninya.io

Find StackOverflow users near you by tags and reputation
MIT License
60 stars 13 forks source link

Make a rock solid continuos SO sync #11

Closed cburgdorf closed 10 years ago

cburgdorf commented 10 years ago

The current approach involves too many manual steps and it leads to dealing with outdated data.

cburgdorf commented 10 years ago

So the basic problem currently is that we don't know how to renew the IP address when the time come that SO blocks us. However, I think we should try to simply throw an exception. If we throw an exception the app will crash and heroku will automatically reboot it (probably with a new IP address as the dynos change all the time).

cburgdorf commented 10 years ago

Simply throwing an exception doesn't work. However, now that the sync is seperated it's pretty easy to harvest data from SO by just running this on a local workstation for the StackWhoSync repository:

while true; do sleep 7; heroku restart; done

However, we should create a process where we can harvest into a spare table and then switch when we harvested enough (currently we have the top 50k users). Because from time to time we want to fetch fresh data.

robinboehm commented 10 years ago

You could result the exception e.g. in a file in your temp folder, create a script that checks the content of this file scheduled with https://addons.heroku.com/scheduler If test ist true, run the restart. Could this be a start?

cburgdorf commented 10 years ago

Ok, this could work. I just wonder how exactly would I restart my heroku process. I've never worked with heroku schedulers so far. There seems to be a ruby API Wrapper (which is fine for this simple task).

http://stackoverflow.com/questions/9612968/how-to-restart-heroku-worker-using-heroku-gem

However, are you aware of a JavaScript API wrapper to handle this?

robinboehm commented 10 years ago

Solved?

cburgdorf commented 10 years ago

Yep, I just made the entire sync scheduler based. This works quite well for now. Thanks for the hint!