A simple and easy to use crawler for web sources (fb, twitter, nodebb, etc)
favara is a Siculo-Arabic word meaning: water source. The Siculo-Arabic language is dead now (IX-XIV century), but we believe the word favara sounds great and its meaning really reflects the purpose of the project.
$ bundle install
database.yml
using the following env variables
config.yml
You will then have to make a choice regarding the ownership of the database tables favara uses:
rake create_tables
.migrations/001_init.rb
inside of it.migrations/001_init.rb
.rake favara
to crawl only the latest contentsrake "favara[true]"
to crawl all posts from all sourcesclockwork clock.rb
to leave favara running, and automatically crawl the latest posts at regular intervals (the default configurtation runs a complete crawling between 11pm and 5am).Favara is designed to import the crawled contents into a database. If that doesn't suit your needs, feel free to copy the files in crawlers/lib/*
containing the database-independent logic and use them as any other ruby library.
We also provide a very thin Sinatra webservice. This is not supposed to be used in production, but it may come in handy for testing or diagnostic. To run it, simply run ruby server.rb
, then point your browser to localhost:4567.
You can check the crawled events under /events and posts under /posts