twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.66k stars 2.72k forks source link

[QUESTION] CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) #888

Open weisiu1997 opened 3 years ago

weisiu1997 commented 3 years ago

I started several processes to send multiple twint commands which crawl the result using different keywords. There would be 9 million requests in total, and I can only get 300000 results a week at most, because I constantly receive this warning 'CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)' and it will sleep for a long time before it continue again. How can I solve this problem? Thanks!

Code:

!/bin/bash

links=cat $1 round=0 num=0 for i in $links do twint -s $i --stats --json -o test.json ((round++)) done

yunusemrecatalcam commented 3 years ago

I guess the twitter server limits request based on the client IP. I have the same problem and gonna try to use tor for passing this. I'll hit you up if it solves the problem

annabechang commented 3 years ago

I guess the twitter server limits request based on the client IP. I have the same problem and gonna try to use tor for passing this. I'll hit you up if it solves the problem

did it work? thanks

yunusemrecatalcam commented 3 years ago

Yes, I used this dockerized torpool project https://github.com/u1234x1234/torpool You can run this proxy server with;

docker run -d -p 9200:9200 -p 9300:9300 u1234x1234/torpool:1.0.2 --MaxCircuitDirtiness 30 --NewCircuitPeriod 30

For routing twint to the proxy you can use something like

c = twint.Config()
c.Search = "from:" + username
c.Store_object = True
c.Limit = 20
c.Proxy_host = "127.0.0.1"
c.Proxy_port = 9300
c.Proxy_type = "http"
twint.run.Search(c)
mozh94 commented 3 years ago

Hello I installed docker but I can't run this code "docker run -d -p 9200:9200 -p 9300:9300 u1234x1234/torpool:1.0.2 --MaxCircuitDirtiness 30 --NewCircuitPeriod 30" with jupiter, I am very basic at python. I will be glad if I am helped.

LinqLover commented 3 years ago

@yunusemrecatalcam Your description sounds very promising, could you maybe go a bit more into detail on how you set up your architecture? I ran docker run -d -p 9200:9200 -p 9300:9300 u1234x1234/torpool:1.0.2 --MaxCircuitDirtiness 30 --NewCircuitPeriod 30 --entrypoint /bin/bash to create the tor container, docker exec -it --user root 7ee9aeab606f bash to connect into it, and tried pip install --user twint to install twint. However, this is giving me a large amount of stderr because "Ignoring numpy: markers 'python_version == "3.7" and platform_system != "AIX"' don't match your environment" ...