twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.75k stars 2.72k forks source link

AttributeError: 'tweet' object has no attribute 'cashtags' #946

Open SiegfriedWagner opened 3 years ago

SiegfriedWagner commented 3 years ago

Command Ran

Powershell command twint.exe -s '$aapl' -o out.txt --limit 40 --json

Description of Issue

Output:

Traceback (most recent call last):
  File "D:\git\twint\venv-twint\Scripts\twint-script.py", line 11, in <module>
    load_entry_point('twint==2.1.21', 'console_scripts', 'twint')()
  File "d:\git\twint\venv-twint\lib\site-packages\twint\cli.py", line 311, in run_as_command
    main()
  File "d:\git\twint\venv-twint\lib\site-packages\twint\cli.py", line 303, in main
    run.Search(c)
  File "d:\git\twint\venv-twint\lib\site-packages\twint\run.py", line 427, in Search
    run(config, callback)
  File "d:\git\twint\venv-twint\lib\site-packages\twint\run.py", line 319, in run
    get_event_loop().run_until_complete(Twint(config).main(callback))
  File "C:\Python38\lib\asyncio\base_events.py", line 608, in run_until_complete
    return future.result()
  File "d:\git\twint\venv-twint\lib\site-packages\twint\run.py", line 239, in main
    await task
  File "d:\git\twint\venv-twint\lib\site-packages\twint\run.py", line 290, in run
    await self.tweets()
  File "d:\git\twint\venv-twint\lib\site-packages\twint\run.py", line 230, in tweets
    await output.Tweets(tweet, self.config, self.conn)
  File "d:\git\twint\venv-twint\lib\site-packages\twint\output.py", line 180, in Tweets
    await checkData(tweets, config, conn)
  File "d:\git\twint\venv-twint\lib\site-packages\twint\output.py", line 166, in checkData
    _output(tweet, output, config)
  File "d:\git\twint\venv-twint\lib\site-packages\twint\output.py", line 116, in _output
    write.Json(obj, config)
  File "d:\git\twint\venv-twint\lib\site-packages\twint\storage\write.py", line 70, in Json
    null, data = struct(obj, config.Custom[_obj_type], _obj_type)
  File "d:\git\twint\venv-twint\lib\site-packages\twint\storage\write.py", line 41, in struct
    row = meta.Data(obj, _type)
  File "d:\git\twint\venv-twint\lib\site-packages\twint\storage\write_meta.py", line 139, in Data
    ret = tweetData(obj)
  File "d:\git\twint\venv-twint\lib\site-packages\twint\storage\write_meta.py", line 22, in tweetData
    "cashtags": t.cashtags,
AttributeError: 'tweet' object has no attribute 'cashtags'

Environment Details

Windows 10, Python 3.8.0, powershell, venv pip freeze

aiodns==2.0.0
aiohttp==3.6.2
aiohttp-socks==0.4.1
async-timeout==3.0.1
attrs==20.2.0
beautifulsoup4==4.9.3
cchardet==2.1.6
certifi==2020.6.20
cffi==1.14.3
chardet==3.0.4
dataclasses==0.6
elasticsearch==7.9.1
fake-useragent==0.1.11
geographiclib==1.50
geopy==2.0.0
googletransx==2.4.2
idna==2.10
multidict==4.7.6
numpy==1.19.2
pandas==1.1.3
pycares==3.1.1
pycparser==2.20
PySocks==1.7.1
python-dateutil==2.8.1
python-socks==1.1.0
pytz==2020.1
requests==2.24.0
schedule==0.6.0
six==1.15.0
soupsieve==2.0.1
twint==2.1.21
urllib3==1.25.10
yarl==1.6.0
SiegfriedWagner commented 3 years ago

Similar error is with csv output e.g

twint.exe -s '$aapl' -o out.txt --limit 40 --csv

Partial output

...
CRITICAL:root:twint.output:_output:CSV:Error:'tweet' object has no attribute 'cashtags'
'tweet' object has no attribute 'cashtags' [x] output._output
1314590904884301825 09-10-2020 17:37:50 +0200 <SimpleGroup_Inc> Stocks that are a good play for the last quarter  $AMZN $AAPL  amazon has its own day #AmazonPrime Day Oct 13 &amp; 14 and conveniently Apple will be unveiling Iphone 12 5G phone on the 13th Prime Day  #covid19 helps these 2 stocks

Do you have any contribution guidelines? If so I can try address this issue.

SiegfriedWagner commented 3 years ago

I made changes to make cashtags work again https://github.com/twintproject/twint/compare/master...SiegfriedWagner:cashtags_fix but I had to comment code responsible for providing retweet data - in twint/storage/write_meta.py (to my defence code responsible for setting retweet fields in tweet was commented out in twint/tweet.py).

Your unit tests framework looks quite nebulous (I am familiar with pytest or unitttest) and during tests i got an error

[+] Beginning vanilla test in <function Following at 0x00000159DA340AF0>
[+] Beginning custom JSON test in <function Following at 0x00000159DA340AF0>
[+] Beginning JSON test in <function Following at 0x00000159DA340AF0>
[+] Beginning custom CSV test in <function Following at 0x00000159DA340AF0>
[+] Beginning CSV test in <function Following at 0x00000159DA340AF0>
[+] Beginning DB test in <function Following at 0x00000159DA340AF0>
[+] Inserting into Database: test_twint.db
[+] Beginning vanilla test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning custom JSON test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning JSON test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning custom CSV test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning CSV test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning DB test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning vanilla test in <function Search at 0x00000159DA340CA0>
Traceback (most recent call last):
  File "D:\git\twint\twint\output.py", line 23, in _formatDateTime
    return int(datetime.strptime(datetimestamp, "%Y-%m-%d %H:%M:%S").timestamp())
  File "C:\Python38\lib\_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "C:\Python38\lib\_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '06-10-2016 20:53:34' does not match format '%Y-%m-%d %H:%M:%S'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".\test.py", line 85, in <module>
    main()
  File ".\test.py", line 75, in main
    test(c, run)
  File ".\test.py", line 10, in test_reg
    run(c)
  File "D:\git\twint\twint\run.py", line 427, in Search
    run(config, callback)
  File "D:\git\twint\twint\run.py", line 319, in run
    get_event_loop().run_until_complete(Twint(config).main(callback))
  File "C:\Python38\lib\asyncio\base_events.py", line 608, in run_until_complete
    return future.result()
  File "D:\git\twint\twint\run.py", line 239, in main
    await task
  File "D:\git\twint\twint\run.py", line 268, in run
    await self.tweets()
  File "D:\git\twint\twint\run.py", line 230, in tweets
    await output.Tweets(tweet, self.config, self.conn)
  File "D:\git\twint\twint\output.py", line 175, in Tweets
    await checkData(tweets, config, conn)
  File "D:\git\twint\twint\output.py", line 139, in checkData
    if datecheck(tweet.datestamp + " " + tweet.timestamp, config):
  File "D:\git\twint\twint\output.py", line 49, in datecheck
    d = _formatDateTime(datetimestamp)
  File "D:\git\twint\twint\output.py", line 25, in _formatDateTime
    return int(datetime.strptime(datetimestamp, "%Y-%m-%d").timestamp())
  File "C:\Python38\lib\_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "C:\Python38\lib\_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '06-10-2016 20:53:34' does not match format '%Y-%m-%d'

but it seems no connected with my changes. BTW. It's looks like you are always testing using twitter website, which at first glance looks unnecessary.

Manoj15 commented 3 years ago

@SiegfriedWagner For the above issue, add an extra try-except in "output.py" after Line 26 which looks like :

def _formatDateTime(datetimestamp):
    try:
        try:
            return int(datetime.strptime(datetimestamp, "%Y-%m-%d %H:%M:%S").timestamp())
        except ValueError:
            return int(datetime.strptime(datetimestamp, "%Y-%m-%d").timestamp())
    except:
        return int(datetime.strptime(datetimestamp, "%d-%m-%Y %H:%M:%S").timestamp())

This will work !!! This is an intermediate solution. But, can solve the problem for now.

himanshudabas commented 3 years ago

@SiegfriedWagner I'll take a look at this maybe later today. Recently a lot of code has been changed due to twitters deprecation of older endpoints. Newer way of scraping is a little different from the older one so there are bound to be a lot of bugs. Moreover some features are not gonna work as of now, like twint -u elonmusk --profile-full is not gonna work. But I am working on it and hopefully will put up a patch soon. Thanks for reporting this error.

himanshudabas commented 3 years ago

the reason why the script broke when searching for cashtags is because my priority was to get the library up and working and I had no idea what cashtags were, (Although I do know now, I did a google search) so I simply commented out the parts which required cashtags. because the way I was scraping data from twitter does't use BeautifulSoup, it gets a JSON Object. and none of the JSON Objects had the cashtag field. So I saw your fix, and did some quick searches on twitter. Twitter does send these cashtags in JSON seperately, so using regex to search for the cashtags seems redundant to me, I'll put up a fix for this, today. Also some of the fields that I have commented out are for the later implementations. They are required in some portions of the code. but again for the sake of getting the library up and running was my priority. Eventually I'll get to them.

himanshudabas commented 3 years ago

@Manoj15 yes, for now it can work.

Due to the new implementation of the search functionality this entire thing (which parses the data received from twitter into tweet, user objects) needs an overhaul.

lmeyerov commented 3 years ago

Hit this now too (fixing up pandas storage option, and already independently redid date fix for it) -- is there a pr somewhere for cashtags, or should I redo that too?

lmeyerov commented 3 years ago

Ah I see it -- @SiegfriedWagner your PR seems to disable other fields in writeMeta, maybe separate that out to enable cherry picking etc?

lmeyerov commented 3 years ago

(Working via https://github.com/TheDataRideAlongs/twint/tree/master )

himanshudabas commented 3 years ago

@lmeyerov I fixed the cashtag bug and will push the commit later today.

lmeyerov commented 3 years ago

@himanshudabas great -- here is an interim pr w/ cashapp fix + pandas fix : https://github.com/twintproject/twint/pull/954