twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.65k stars 2.72k forks source link

Error in writing 'removed' copyright tweet to DB #321

Closed LuD1161 closed 5 years ago

LuD1161 commented 5 years ago

First of all a lot of thanks for making this tool, it's great and has helped me a lot.

BUG

Description of Issue

When a hidden tweet is encountered, the output shows as

[x] Hidden tweet found, account suspended due to violation of TOS

and that's fine but when the same is used for writing it to sqlite3 database it shows an error

can only join an iterable

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/twint-1.2.1-py3.6.egg/twint/run.py", line 215, in Search
  File "/usr/local/lib/python3.6/dist-packages/twint-1.2.1-py3.6.egg/twint/run.py", line 167, in run
  File "/usr/lib/python3.6/asyncio/base_events.py", line 473, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.6/dist-packages/twint-1.2.1-py3.6.egg/twint/run.py", line 154, in main
  File "/usr/local/lib/python3.6/dist-packages/twint-1.2.1-py3.6.egg/twint/run.py", line 117, in tweets
  File "/usr/local/lib/python3.6/dist-packages/twint-1.2.1-py3.6.egg/twint/output.py", line 118, in Tweets
  File "/usr/local/lib/python3.6/dist-packages/twint-1.2.1-py3.6.egg/twint/output.py", line 96, in checkData
  File "/usr/local/lib/python3.6/dist-packages/twint-1.2.1-py3.6.egg/twint/storage/db.py", line 241, in tweets
TypeError: can only join an iterable

Command Ran

import twint
c = twint.Config()
c.Search = "BugBounty"
c.Until = "2018-06-08"
c.Database = "/var/www/html/a.db"
twint.run.Search(c)

Environment Details

Linux Ubuntu 18.04

Possible Solution

Looking into the code I found that it was because SQLite was trying to join None as the tweet data didn't contain anything but

None None None None <None> None

As you can see here :

1004741515527237632 2018-06-07 15:06:59 UTC <ismailtsdln> Reflected #XSS vulnerability at @Konami fixed.  😎 🎲 🎮 ♟️ 🎳 🎰 🕹️ #BugBounty #Vulnerability #Fixed #Web #Security #InfoSec #SecurityRe
search pic.twitter.com/XR1Fo4xZHk
None None None None <None> None
can only join an iterable
[x] Hidden tweet found, account suspended due to violation of TOS
1004736249222107137 2018-06-07 14:46:03 UTC <Grey__Demon> Justin Sun ups Tron’s bug bounty program to $10 million  https://zycrypto.com/justin-sun-ups-trons-bug-bounty-program-to-10-million/ … 
#tron #trx #BugBounty #btc #bitcoin #eos #ethereum #eth #altcoins #hackers

Initially, I thought of adding just one more check in the try-catch exception clause in the twint/storage/db.py here for TypeError and it helps.

However I think it would effect others too, like for elasticsearch etc, so it would be better if there's the copyright check here here also Like this :

    if datecheck(tweet.datestamp, config):
        output = format.Tweet(config, tweet)
        if copyright is None and is_tweet(tweet):    # <---- I have added this
            if config.Database:
                db.tweets(conn, tweet, config)

            if config.Pandas:
                panda.update(tweet, config)

It solves the problem, let me know if there's more to be done.

P.S.

Here's the diff on my fork

pielco11 commented 5 years ago

Hi @LuD1161 and thank you for reporting this!

@andytnt what do you think?

pielco11 commented 5 years ago

I think that we could edit this blocks https://github.com/twintproject/twint/blob/a76bc07d7c28fb13334564e4f453cd9c07afc294/twint/output.py#L84-L107 by indenting from https://github.com/twintproject/twint/blob/a76bc07d7c28fb13334564e4f453cd9c07afc294/twint/output.py#L92

So that the if copyright is None and is_tweet(tweet) statement is already evaluated and we do not have to re-evaluate again

LuD1161 commented 5 years ago

@pielco11 Yup, that looks better :+1:

pielco11 commented 5 years ago

@LuD1161 feel free to try and let me know if you still get errors somehow. I'm going to close this, feel free to re-open in case of new errors related to this issue