Open SiegfriedWagner opened 4 years ago
Similar error is with csv output e.g
twint.exe -s '$aapl' -o out.txt --limit 40 --csv
Partial output
...
CRITICAL:root:twint.output:_output:CSV:Error:'tweet' object has no attribute 'cashtags'
'tweet' object has no attribute 'cashtags' [x] output._output
1314590904884301825 09-10-2020 17:37:50 +0200 <SimpleGroup_Inc> Stocks that are a good play for the last quarter $AMZN $AAPL amazon has its own day #AmazonPrime Day Oct 13 & 14 and conveniently Apple will be unveiling Iphone 12 5G phone on the 13th Prime Day #covid19 helps these 2 stocks
Do you have any contribution guidelines? If so I can try address this issue.
I made changes to make cashtags work again https://github.com/twintproject/twint/compare/master...SiegfriedWagner:cashtags_fix
but I had to comment code responsible for providing retweet data - in twint/storage/write_meta.py
(to my defence code responsible for setting retweet fields in tweet was commented out in twint/tweet.py
).
Your unit tests framework looks quite nebulous (I am familiar with pytest
or unitttest
) and during tests i got an error
[+] Beginning vanilla test in <function Following at 0x00000159DA340AF0>
[+] Beginning custom JSON test in <function Following at 0x00000159DA340AF0>
[+] Beginning JSON test in <function Following at 0x00000159DA340AF0>
[+] Beginning custom CSV test in <function Following at 0x00000159DA340AF0>
[+] Beginning CSV test in <function Following at 0x00000159DA340AF0>
[+] Beginning DB test in <function Following at 0x00000159DA340AF0>
[+] Inserting into Database: test_twint.db
[+] Beginning vanilla test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning custom JSON test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning JSON test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning custom CSV test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning CSV test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning DB test in <function Followers at 0x00000159DA340A60>
[+] Inserting into Database: test_twint.db
[+] Beginning vanilla test in <function Search at 0x00000159DA340CA0>
Traceback (most recent call last):
File "D:\git\twint\twint\output.py", line 23, in _formatDateTime
return int(datetime.strptime(datetimestamp, "%Y-%m-%d %H:%M:%S").timestamp())
File "C:\Python38\lib\_strptime.py", line 568, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "C:\Python38\lib\_strptime.py", line 349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data '06-10-2016 20:53:34' does not match format '%Y-%m-%d %H:%M:%S'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".\test.py", line 85, in <module>
main()
File ".\test.py", line 75, in main
test(c, run)
File ".\test.py", line 10, in test_reg
run(c)
File "D:\git\twint\twint\run.py", line 427, in Search
run(config, callback)
File "D:\git\twint\twint\run.py", line 319, in run
get_event_loop().run_until_complete(Twint(config).main(callback))
File "C:\Python38\lib\asyncio\base_events.py", line 608, in run_until_complete
return future.result()
File "D:\git\twint\twint\run.py", line 239, in main
await task
File "D:\git\twint\twint\run.py", line 268, in run
await self.tweets()
File "D:\git\twint\twint\run.py", line 230, in tweets
await output.Tweets(tweet, self.config, self.conn)
File "D:\git\twint\twint\output.py", line 175, in Tweets
await checkData(tweets, config, conn)
File "D:\git\twint\twint\output.py", line 139, in checkData
if datecheck(tweet.datestamp + " " + tweet.timestamp, config):
File "D:\git\twint\twint\output.py", line 49, in datecheck
d = _formatDateTime(datetimestamp)
File "D:\git\twint\twint\output.py", line 25, in _formatDateTime
return int(datetime.strptime(datetimestamp, "%Y-%m-%d").timestamp())
File "C:\Python38\lib\_strptime.py", line 568, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "C:\Python38\lib\_strptime.py", line 349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data '06-10-2016 20:53:34' does not match format '%Y-%m-%d'
but it seems no connected with my changes. BTW. It's looks like you are always testing using twitter website, which at first glance looks unnecessary.
@SiegfriedWagner For the above issue, add an extra try-except in "output.py" after Line 26 which looks like :
def _formatDateTime(datetimestamp):
try:
try:
return int(datetime.strptime(datetimestamp, "%Y-%m-%d %H:%M:%S").timestamp())
except ValueError:
return int(datetime.strptime(datetimestamp, "%Y-%m-%d").timestamp())
except:
return int(datetime.strptime(datetimestamp, "%d-%m-%Y %H:%M:%S").timestamp())
This will work !!! This is an intermediate solution. But, can solve the problem for now.
@SiegfriedWagner
I'll take a look at this maybe later today.
Recently a lot of code has been changed due to twitters deprecation of older endpoints. Newer way of scraping is a little different from the older one so there are bound to be a lot of bugs. Moreover some features are not gonna work as of now, like twint -u elonmusk --profile-full
is not gonna work. But I am working on it and hopefully will put up a patch soon.
Thanks for reporting this error.
the reason why the script broke when searching for cashtags is because my priority was to get the library up and working and I had no idea what cashtags were, (Although I do know now, I did a google search) so I simply commented out the parts which required cashtags. because the way I was scraping data from twitter does't use BeautifulSoup, it gets a JSON Object. and none of the JSON Objects had the cashtag field. So I saw your fix, and did some quick searches on twitter. Twitter does send these cashtags in JSON seperately, so using regex to search for the cashtags seems redundant to me, I'll put up a fix for this, today. Also some of the fields that I have commented out are for the later implementations. They are required in some portions of the code. but again for the sake of getting the library up and running was my priority. Eventually I'll get to them.
@Manoj15 yes, for now it can work.
Due to the new implementation of the search functionality this entire thing (which parses the data received from twitter into tweet, user objects) needs an overhaul.
Hit this now too (fixing up pandas storage option, and already independently redid date fix for it) -- is there a pr somewhere for cashtags, or should I redo that too?
Ah I see it -- @SiegfriedWagner your PR seems to disable other fields in writeMeta
, maybe separate that out to enable cherry picking etc?
(Working via https://github.com/TheDataRideAlongs/twint/tree/master )
@lmeyerov I fixed the cashtag bug and will push the commit later today.
@himanshudabas great -- here is an interim pr w/ cashapp fix + pandas fix : https://github.com/twintproject/twint/pull/954
Command Ran
Powershell command
twint.exe -s '$aapl' -o out.txt --limit 40 --json
Description of Issue
Output:
Environment Details
Windows 10, Python 3.8.0, powershell, venv pip freeze