Open icmpnorequest opened 4 years ago
Hello @icmpnorequest, I was with this same error and found a code from someone here in the community ! It works like a charm, you can scrap all day/night.
import pandas as pd
import twint
from datetime import datetime, timedelta
from time import sleep
import os
#query = 'Words OR To OR Search OR here'
start_str = "2020-04-01"
end_str = "2020-06-25"
start_date = pd.to_datetime(start_str, format='%Y-%m-%d', errors='ignore')
end_date = pd.to_datetime(end_str, format='%Y-%m-%d', errors='ignore')
data_folder = "/Path/To/Save/"
filename = f"{data_folder}collect_tweets_{start_str}_{end_str}.txt"
resume_file = f"{data_folder}resume.txt"
c = twint.Config()
c.Verified = True
c.Retweets = False
c.Filter_retweets = False
c.Hide_output = False
c.Output = filename
c.Resume = resume_file
c.Search = query
c.Lang = 'en'
c.Links = "exclude"
#c.Custom["tweet"] = ["tweet"]
c.Format = "{tweet}"
while start_date < end_date:
check = 0
c.Since = datetime.strftime(start_date, format='%Y-%m-%d')
c.Until = datetime.strftime(start_date + timedelta(days=1), format='%Y-%m-%d')
while check < 1:
try:
print("Running Search: Check ", start_date)
twint.run.Search(c)
check += 1
except Exception as e:
# pause when twitter blocks further scraping
print(e, "Sleeping for 7 mins")
print("Check: ", check)
sleep(420)
# before iterating to the next day, remove the resume file
os.remove(resume_file)
# increment the start date by one day
start_date = start_date + timedelta(days=1)
Error Report: Scraping a certain user's tweets but return 443 connection error
Initial Check
Command Ran
Description of Issue
I wanna scrape tweets of Trump, however, I could get nothing but 443 connection error. Does it matter with the change of frontend API modified by Twitter?
The error is as following:
Environment Details