twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.64k stars 2.72k forks source link

\x00 character results in error #595

Closed xsser closed 4 years ago

xsser commented 4 years ago

Mac, python3.7.4

command i ran : twint -s "${val}" --since ${today} -o ./data/${value}_${today}.json --json

-s string is "dos", you can run twint -s "dos"to redemonstrate it .

Error result i got:

Expecting value: line 1 column 1 (char 0) [x] run.Feed
[!] if get this error but you know for sure that more tweets exist, please open an issue and we will investigate it!
pielco11 commented 4 years ago

immagine

That's what I get and it keeps going. Do you get the same error message with every query? Are you using a specific setup or what?

xsser commented 4 years ago

nop, i got the same error with evert query.

search_worker error occured:CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/connector.py", line 936, in _wrap_create_connection return await self._loop.create_connection(*args, **kwargs) # type: ignore # noqa File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 981, in create_connection ssl_handshake_timeout=ssl_handshake_timeout) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 1009, in _create_connection_transport await waiter ConnectionResetError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File "/usr/local/bin/twint", line 11, in load_entry_point('twint==2.1.8', 'console_scripts', 'twint')() File "/Users/aaa/src/twint/twint/cli.py", line 310, in run_as_command main() File "/Users/aaa/src/twint/twint/cli.py", line 302, in main run.Search(c) File "/Users/aaa/src/twint/twint/run.py", line 281, in Search run(config, callback) File "/Users/aaa/src/twint/twint/run.py", line 202, in run get_event_loop().run_until_complete(Twint(config).main(callback)) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete return future.result() File "/Users/aaa/src/twint/twint/run.py", line 146, in main await task File "/Users/aaa/src/twint/twint/run.py", line 187, in run await self.tweets() File "/Users/aaa/src/twint/twint/run.py", line 129, in tweets await self.Feed() File "/Users/aaa/src/twint/twint/run.py", line 49, in Feed response = await get.RequestUrl(self.config, self.init, headers=[("User-Agent", self.user_agent)]) File "/Users/aaa/src/twint/twint/get.py", line 119, in RequestUrl response = await Request(_url, params=params, connector=_connector, headers=headers) File "/Users/aaa/src/twint/twint/get.py", line 144, in Request return await Response(session, url, params) File "/Users/aaa/src/twint/twint/get.py", line 152, in Response async with session.get(url, ssl=True, params=params, proxy=httpproxy) as response: File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/client.py", line 1012, in aenter self._resp = await self._coro File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/client.py", line 483, in _request timeout=real_timeout File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/connector.py", line 523, in connect proto = await self._create_connection(req, traces, timeout) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/connector.py", line 859, in _create_connection req, traces, timeout) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/connector.py", line 1004, in _create_direct_connection raise last_exc File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/connector.py", line 986, in _create_direct_connection req=req, client_error=client_error) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp_socks/connector.py", line 53, in _wrap_create_connection protocol_factory, None, None, sock=sock.socket, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/connector.py", line 943, in _wrap_create_connection raise client_error(req.connection_key, exc) from exc aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host twitter.com:443 ssl:True [None]

search_worker error occured:CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)

pielco11 commented 4 years ago

567

pielco11 commented 4 years ago

May you try reverting those commits on your local repo and let me know if something changes, please? (you have just to change the prefix of urls and a couple things around, pretty easy and straightforward)

xsser commented 4 years ago

ok ,i will do it . ps,i didn't change any code in twint, i just wrote my nodejs code ,and call twint cli

such like:

task.forEach(function (val,index,arr) {
            var exec = require('child_process').exec;
            today = require("./lib");
            today = today.getToday();
            exec(`twint -u ${val} --since ${today}  -o ./data/${val}_${today}.json --json`, function(err,stdout,stderr){
                if(err) {
                    console.log('worker出问题了:'+stderr);
                } else {
                    console.log(stdout);
                    process.send('任务完成!')
                }
            })

        })

each task is a twitterID

xsser commented 4 years ago

I still got the problem in querying

twint -u GoogleHacking --since 2019-11-25

/usr/local/python3/lib/python3.7/site-packages/aiohttp-4.0.0a1-py3.7-linux-x86_64.egg/aiohttp/client.py:977: RuntimeWarning: coroutine 'noop' was never awaited
  self._resp.release()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/python3/lib/python3.7/site-packages/aiohttp-4.0.0a1-py3.7-linux-x86_64.egg/aiohttp/client.py:518: RuntimeWarning: coroutine 'noop' was never awaited
  resp.release()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/python3/lib/python3.7/site-packages/aiohttp-4.0.0a1-py3.7-linux-x86_64.egg/aiohttp/client.py:541: RuntimeWarning: coroutine 'noop' was never awaited
  resp.release()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
1201483161638539265 2019-12-02 20:48:02 CST <GoogleHacking> [Files Containing Juicy Info] ext:sql intext:@gmail.com intext:e10adc3949ba59abbe56e057f20f883e http://dlvr.it/RKWF3g 
1201483160606793728 2019-12-02 20:48:02 CST <GoogleHacking> [Pages Containing Login Portals] intitle:"TMSoft MyAuth Gateway 3" -DOWNLOAD http://dlvr.it/RKWF3F 
1201483158971043842 2019-12-02 20:48:02 CST <GoogleHacking> [Pages Containing Login Portals] inurl:10443/remote/login http://dlvr.it/RKWF2x 
1201483157935030272 2019-12-02 20:48:01 CST <GoogleHacking> [Pages Containing Login Portals] intitle:MK-AUTH :: CONTEUDO RESTRITO -site: http://mk-auth.com.br  http://dlvr.it/RKWF29 
1199336639769657345 2019-11-26 22:38:32 CST <GoogleHacking> [Pages Containing Login Portals] site:*/my.policy http://dlvr.it/RK7pzC 
pielco11 commented 4 years ago

About RuntimeWarning: Enable tracemalloc to get the object allocation traceback, Idk why you are getting it. That's new to me

About CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0), it seems to be related to #567 but I'm not sure

Here is my output immagine

xsser commented 4 years ago

I'v changed the file you post to me , but it didn't work.

xsser commented 4 years ago

And I use --proxy-type socks5 --proxy-port 1080 --proxy-host 127.0.0.1 which is v2ray options. Does this affect?

pielco11 commented 4 years ago

In my experience, using proxies does not break the connection. Assuming that the proxy is working correctly

xsser commented 4 years ago

I may know the key to solving this problem... I captured an HTTP request package directing to

/i/search/timeline?vertical=default&src=unkn&include_available_features=1&include_entities=1&max_position=-1&reset_error_state=false&f=tweets&q=+from:google_hacking

and i got the repsonse with

You are on Twitter Mobile because you are using an old version of Chrome. Learn more here

I think it is user-agent that makes bs4 unable to grab and parse the corresponding data. Am i right? And my testing user-agent is

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:70.0) Gecko/20100101 Firefox/70.0

twint's user-agent:

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36\

But... it doesn't make sence..

WHY did you succeed? I failed

pielco11 commented 4 years ago

You can manually replace the UserAgent modifying the code at

https://github.com/twintproject/twint/blob/7ea55b2aa0dd1c5c2fe991347e2dfce7fbdb2d43/twint/get.py#L155-L161

Try manually specifying the UA that you want to use