mnich0ls / evee-sd

An event aggregation app for events in San Diego
1 stars 2 forks source link

List of additional sites to scrape #2

Closed mnich0ls closed 5 years ago

mnich0ls commented 5 years ago

This is the first draft of a list of other sites we plan to scrape. I will break these out into individual tasks once they are ready to be worked on.

https://www.sandiegoreader.com/events https://www.eventbrite.com/d/ca--san-diego/events https://www.sandiegomagazine.com/calendar https://www.sandiego.gov/events https://www.bandsintown.com/en/c/san-diego-ca https://www.meetup.com/find/events/?allMeetups=true&radius=50&userFreeform=San+Diego%2C+CA&mcId=z92101&mcName=San+Diego%2C+CA&eventFilter=all https://sandiegotheatres.org/calendar-5/ http://www.cygnettheatre.com https://www.balboapark.org/events https://govavi.leagueapps.com/leagues?state=LIVE&locationId=&seasonId=&days=&levelId=&_ga=2.26583293.1742813550.1552014502-1770543235.1532528741 https://www.active.com/search?keywords=&location=San+Diego%2C+CA%2C+USA&category=Activities&daterange=All+future+dates&clckmp=activecom_home_hero_activitysearch

RobotHuman commented 5 years ago

Robert here, to preface, I can still do my initial scraper for $100, just to maintain my foot in the door price from earlier, haha. It looks like there is a great deal of various in the markup and organization of each of these websites and within them. It's no problem. It just adds some complexity.

Like the difference between this one from eventbrite https://www.eventbrite.com/e/2019-calnena-9-1-1-mission-critical-training-event-registration-46486125252?aff=ebdssbcitybrowse

And this one https://www.eventbrite.com/e/2019-doterra-leadership-retreat-san-diego-ca-tickets-52319971443?aff=ebdssbcitybrowse

That site I would charge $250 for a robust and functional scraper based on selenium in python.

This one I can do for $225 https://www.balboapark.org/event/17915

https://www.sandiego.gov/event/bloom-bash-2019 this one is pretty simple to crawl and parse. This one I can do for $175.

https://www.sandiegomagazine.com/calendar/index.php/name/World-CBD-Expo/event/48099/requiressl/true/ I could take care of that one for $195.

This is the one I can do for $100 initially https://www.sandiego.gov/event/bloom-bash-2019). The rest I can do at a flat 175 each. I work at a pretty decent speed. Most of these are doable in a span of 1-3 days each.

Does that sound acceptable?

mnich0ls commented 5 years ago

Hi @RobotHuman, thank you for taking a look at these. I just want to clarify whether you're planning to scrape the set of available events through each of these sites or whether you are basing your bid on a "per event" basis for each site? I'm looking to pull all available events from each of these sites. I will start breaking out this task into the individual sites.

What are your thoughts on how the scrapers will be hosted and configured to run? Do you have any recommendations based on the tools/tech/language you will write these in?

mnich0ls commented 5 years ago

Please see the wiki page

RobotHuman commented 5 years ago

Yes, I would be scraping the full set of available events through each of these sites at my given price points. These scrapers could would live on a linux vps. These guys are a decent option (https://www.interserver.net/vps/windows-vps.html ), we could get a 2 core system with 4GB of ram @ $12/month, and that would be plenty for this purpose.

They could be triggered by a cron job, and they'll interface with firefox mainly through selenium and geckodriver.