medialab / minet

A webmining CLI tool & library for python.
GNU General Public License v3.0
273 stars 26 forks source link

Scrapping 1000's of comments on Instagram #968

Open Geminy3 opened 3 months ago

Geminy3 commented 3 months ago

I'm getting an minet.instagram.exceptions.InstagramPublicAPIInvalidResponseError while trying to get all the comments from a post on Instagram, which interrupt the scrapping. I guess it's because there's a lot of comments on this post, because when I connect back to Instagram on a browser, I have a message for suspicial activities on my account.

I used this CL: minet instagram comments URL -i urls_insta.csv -o comments.csv With this csv

Is there any way to fix that error ? Or is anyone faced this same issue ?

Thanks !

Yomguithereal commented 3 months ago

Are you using your Instagram account while the scraper is running? Or are you scraping multiple things at once using the same account. Instagram rate limiting is very fickle and there is no surefire way to make it work without it sometimes failing.

Geminy3 commented 3 months ago

I opened a instagram page while the scrapper is running, but i'm not navigating while minet works. And I'm scrapping only comments from one post at a time, not trying to overload Instagram with a lot of different requests. Is there any possibility to restart the scrapping from where it fails, meaning starting again at the last comment scraped ? Thanks !

Yomguithereal commented 3 months ago

Is there any possibility to restart the scrapping from where it fails, meaning starting again at the last comment scraped ?

It's probably doable because the pagination does not rely on transient ids but rather on id bounds (through a max_id GET parameter), but the nesting of the comment hierarchy between root-level comments and child comments might make this a bit perilous to implement correctly.