I am now officially archiving this repo after a long time of, well, not maintaining.
A non API python program to crawl public photos, posts, followers, and following
To crawl followers or followings, you will need to login with your credentials either by filling in 'auth.json' or typing in(as you would do when you are simply browsing instagram)
Well, it is to copy 'auth.json.example' to 'auth.json' and fill in your username and password
For headless browser, after installing phantomjs, add '-l' to the arguments
Download the first 100 photos and captions(user's posts, if any) from username "instagram"
$ python instagramcrawler.py -q 'instagram' -c -n 100
Search for the hashtag "#breakfast" and download first 50 photos
$ python instagramcrawler.py -q '#breakfast' -n 50
Record the first 30 followers of the username "instagram", requires log in
$ python instagramcrawler.py -q 'instagram' -t 'followers' -n 30 -a auth.json
usage: instagramcrawler.py [-h] [-d DIR] [-q QUERY] [-t CRAWL_TYPE] [-n NUMBER] [-c] [-a AUTHENTICATION]
There are 2 packages : selenium & requests
$ pip install -r requirements.txt
bash utils/get_gecko.sh
bash utils/get_phantomjs.sh
source utils/set_path.sh