jeffdeville / sherlock_homes

0 stars 0 forks source link

When should the scraping step happen? #7

Closed algodave closed 9 years ago

algodave commented 9 years ago

@jeffdeville

When should the scraping step happen in your vision of the life-cycle of the end-user app?

I was thinking to a couple of options, but you might be thinking to different approaches!

Option 1

Scraping from Redfin and Trulia is a backend task, performed in the setup phase of the app. Its purpose is to pre-load our own storage with the data we need.

Option 2

Scraping from Redfin and Trulia is a live task, performed when our user searches for a specific address. As soon as we scrape data, we save them on our own storage, so that next time a user searches for that same address, we don't have to scrape again.

jeffdeville commented 9 years ago

For the moment, this won't have a web front end I don't think. It's going to be to help me put property brochures together, and initially, if I just have all of the data in one place, that will be sufficient.

Down the road though, what I'd like to do is make this just 1 step in a data processing pipeline. So what would go in would be an address, and the data would come out, to then be further processed by other components in an attempt to come up with an estimated value, and suggested purchase price. Right now, I don't have the experience to put this together, so we're primarily in data acquisition mode. But the overall workflow engine could be fairly extensive. (use demographics, crime data, neighborhood appreciation rate, nearby school ratings, etc, and then assign weights to that data to help value a property. There might be offline components to it even, where I assign people to drive by the property to make sure it still looks like it's most recent pictures, etc. There'll probably be a human element in evaluating comps as well.

As for the trigger, it could be a simple command-line thing for now.

Does that help?

algodave commented 9 years ago

Thanks, that makes more clear how to achieve the current goals