h4t0n / instagram-scraper

Some scrapy spiders useful to crawl instagram posts using public APIS (No TOKEN)
GNU General Public License v3.0
145 stars 57 forks source link

Instagram Scrapy Scraper

Some scrapy spiders useful to crawl instagram posts using public APIS (No TOKEN)

Requirements

Spiders

Usage

scrapy crawl hashtag

Use the -L INFO to avoid a lot of debug messages

Output

The scraper put its files under the scraped directory

hashtag spider

File are put under scraped/hashtag/hashtagname, by date and hour of the day. This is because if you execute the crawler multiple times in the same hour the output is appended. Files contains a Json for each line.

For example :

{"id": "1684344684669792291", "shortcode": "Bdf_pERD3Aj", "caption": "\"Non c'\u00e8 amore pi\u00f9 sincero di quello per il cibo\". Panino caldo e croccante con caciocavallo, zucchine grigliate e pesto di pomodori secchi. \ud83d\ude0b #myferrara #labellaferrara #volgoitalia #volgoemiliaromagna #volgoferrara #igersferrara #volgosapori #italia_in_grande #centrostorico #iconsigliati #visitferrara #cibobuono #cosebuone #qualit\u00e0 #passione #genuinit\u00e0 #freschezza #tagsforlikes", "display_url": "https://instagram.ffco2-1.fna.fbcdn.net/vp/71a6e1bc5183bbd9b1339f064b2bb1b9/5B231526/t51.2885-15/e35/25021917_402106523561566_9076772742774128640_n.jpg", "loc_id": 0, "loc_name": "", "loc_lat": 0, "loc_lon": 0, "owner_id": "5655088891", "owner_name": "tipicoh_ferrara", "taken_at_timestamp": 1515009554}
{"id": "1684875047875260104", "shortcode": "Bdh4O3fhi7I", "caption": "#roma #piazzanavona #pjmasks #sky #ballons #detail #thehub_lazio #lazio_illife #new_photolazio #yallerslazio  #arts_illife #vivolazio #volgolazio #lazio_super_pics #visit_lazio #italiaStyle20 #iconsigliati  #volgoarte #shotz_of_lazio", "display_url": "https://instagram.ffco2-1.fna.fbcdn.net/vp/20459e8002d4a61284bc7e03f0da5f8a/5B0B896A/t51.2885-15/e35/26065496_174719709946144_8885150857012707328_n.jpg", "loc_id": "336844629", "loc_name": "Piazza Navona", "loc_lat": 41.8988888889, "loc_lon": 12.4730555556, "owner_id": "256276180", "owner_name": "clapanama", "taken_at_timestamp": 1515072779}

License

GNU GENERAL PUBLIC LICENSE Version 3