isaacmg / fb_scraper

FBLYZE is a Facebook scraping system and analysis system.
Apache License 2.0
64 stars 21 forks source link

Docker Error KeyError #21

Open ameygat opened 5 years ago

ameygat commented 5 years ago

I have run docker for first time and I get keyerror, it seems code is trying to get postgress user and database, So is it needed to be created on base system ? There were no instructions to setup DB on https://github.com/isaacmg/fb_scraper/wiki/Docker-Image

variables.list:

FB_ID=myappid
FB_KEY=mysecreate
IDS=cnn,paddlesoft,msnbc
# Include only if you want to scrape comments
COMMENTS=1
# Include below ONLY if you want to use Kafka.
USE_KAFKA=1
KAFKA_PORT=localhost:9092

Error: docker run --env-file variables.list paddlesoft/fb_scraper Traceback (most recent call last): File "threaded_proc.py", line 6, in <module> from fb_scrapper import scrape_groups_pages File "/fb_scraper/fb_scrapper.py", line 2, in <module> from fb_posts import FB_SCRAPE File "/fb_scraper/fb_posts.py", line 11, in <module> from save_pg import save_post_pg File "/fb_scraper/save_pg.py", line 3, in <module> db = Database(os.environ['db'], user=os.environ['pg_user'], password=os.environ['pg_password'], host=os.environ['pg_host'], database=os.environ['pg_db']) File "/opt/conda/lib/python3.6/os.py", line 669, in __getitem__ raise KeyError(key) from None KeyError: 'db'

isaacmg commented 5 years ago

Yeah I think the newer version by default uses PostgreSQL to save scrapping times as shelving is unstable. So I think you need to set the following db = Database(os.environ['db'], user=os.environ['pg_user'], password=os.environ['pg_password'], host=os.environ['pg_host'], database=os.environ['pg_db']) I might create a parameter in the future where you can choose whether to use PostgreSQL too. Also if you want to make PR to add the parameter that would probably be the quickest way.

ameygat commented 5 years ago

But is there already a postgress added in to the Docker or we need to setup that database on the host ?

isaacmg commented 5 years ago

At the moment Postgres is not part of the Dockerfile as it would take up too much memory. The way I was using is as part of a docker-compose with a separate container for PostgreSQL and another for FB scraper. Another easy option is to just set up a Heroku app as those come with a free PostgreSQL database. Just added some documentation on this to the wiki page as well.