Open JJery-web opened 2 years ago
You're scraping too much, too fast. Try add in some time.sleep
You're scraping too much, too fast. Try add in some
time.sleep
Thanks. So what is the reasonable frequency please. (For scrape every posts and every user)
Less than 10K posts per day should be safe
Less than 10K posts per day should be safe
I use the code: for post in get_posts(account=link, pages=None, timeout=120,cookies=cookies.json, options={"allow_extra_requests": False,"reactions":False,"posts_per_page": 300}): post_info = [] post_info.append(post['time']) post_info.append(post["post_url"]) post_info.append(post['text']) post_info.append(post['likes']) post_info.append(post['comments']) post_info.append(post['shares']) post_info.append(post["shared_text"]) post_info.append(post['link']) post_info.append(post['images_lowquality_description']) post_info.append(post['video']) post_info.append(post['video_duration_seconds']) post_info.append(post['video_watches']) post_info.append(post['image']) post_info.append(post['reactions']) print(post_info) #here output the messy information.
So I get a very messy information in csv. I don't know how to deal with this.
try update lxml with pip install -U lxml
try update lxml with
pip install -U lxml
Yes. It works. Thank you bro.
For more stable to scrape the data, I use my other email to register the account. So I use these cookies to scrape. But I find these new accounts are banned (30 days). I use the different computer or use the cycle function for the cookies like"from itertools import cycle cookies = cycle(["cookies.json","mycookies4.json","mycookies.json"])"
But my accounts are banned 30 days easily. Do anyone know the problems?