taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.39k stars 579 forks source link

Use seconds in date queries, add json dump #258

Open TrueCarry opened 4 years ago

TrueCarry commented 4 years ago

Hello guys. Sorry, I'm new to python, maybe something is done very badly. I've tried to follow original code as much as I can.

Twitter supports date queries using since_time and until_time, so I've updated package a little bit to use it. So now you can do query like twitterscraper "love OR day" --limit 1000 -bd "2020-02-01 21:40:50" -ed "2020-02-01 21:40:55".

I've also added -dj or --dump-json flag, to dump output directly, but in json, so I can use this package as cli utility from another program.

taspinar commented 4 years ago

Hi @TrueCarry Thank you for your contribution to twitterscraper. I think most people are scraping for tweets for a longer date range and hence have no need for specifying the date range to the second. But it could be useful for someone who is interested in a very specific timerange.

However, I would suggest that you do not change the already existing structure but make an addition (so it remains backward compatible). So people should still be able to specify dates in the "%Y-%m-%d" format, but can also specify date times in the "%Y-%m-%d %H:%M:%S" format.

So the valid_date function need to be able to return values for both formats.

It is good that you still check if end date time is later then begin date time in query.py with if(no_secs < 0): but the rest of query.py should use no_days instead of no_secs:

if the entered begindate /enddate have the "%Y-%m-%d" format use the old method for building up queries '{} since:{} until:{}'.format(query, since, until), and if it they have the "%Y-%m-%d %H:%M:%S" format use your method.