Closed ghost closed 5 years ago
First, thank you so much for taking time to write a properly documented issue
I tried twint --retweets -u BouloGiletJaune
and got 176
tweets, still less than expected.
While this might sound like Twint is doing something wrong, the point is that Twitter stops Twint before reaching the beginning of the timeline. This can be verified via browser as showed in this screenshot
Which is the latest returned tweet by Twint (at least in my experience)
Profile_full = True
This option requires a lot of time by construction, this should be used only if the account is shadow banned (which means that you can't find his/her/its tweets via search bar)
There is a checking-args error (in Twint code) that I'm going to fix really quickly. Please consider that All
option might require a lot of time since it returns tweets sent by/to him, and tweets that mention him
https://github.com/twintproject/twint/blob/ad27650fbc0bf8c3f2c78449088a5ede7239f53a/twint/url.py#L100-L101
If you use Twint as module, everything will work as expected
There are limitations with --retweets
and --profile-full
imposed by Twitter, limitations that we can't handle or workaround.
There's a checking-error in Twint code, which affects only if you use Twint via CLI
@pielco11 thank you for taking the time to formulate a prompt and structured response.
While this might sound like Twint is doing something wrong, the point is that Twitter stops Twint before reaching the beginning of the timeline. This can be verified via browser.
In the browser, the last tweet returned is indeed the same as twint
grabs:
1069924606788685824 2018-12-04 12:01:19 BST <BouloGiletJaune> Un moratoire et quelques mesurettes...
In your response you're hinting that Twitter stops before reaching the beginning of the timeline. Maybe there are more tweets before that one? So I've checked using the python-twitter
library which taps into Twitter API directly (but is limited to the last 3,200 tweets) and I got 274 tweets. So indeed it's like you wrote: Twitter is limiting scraping.
Shorter version (tl;dr)
twint
returns much less tweets than the number of tweets displayed on the Twitter page of a user.Example with @BouloGiletJaune:
Longer version
Description
On Linux fedora29, using the latest
twint 1.2.3
withPython 3.7
in a brand new virtual environment, when I run this command it returns about only half the tweets available:However this particular Twitter user has posted 278+ tweets: https://twitter.com/BouloGiletJaune?lang=en Missing tweets occur with all the twitter accounts I could try.
Using
Profile_full=True
With the
Profile()
function andProfile_full=True
twint returns more tweets (272 tweets) yet not the right amount (278), but it's slow as hell:Profile_full
is not a solutionMissing 6 tweets doesn't seem like a big issue, but with an account that has many more tweets (35k)
--profile-full
) still misses about 2,000 tweets, not mentioning the hours it takes to complete. So it's definitely not a viable workaround.The
--all
option doesn't seem to workTo be noted: the
--all
command line option is supposed to return *all* tweets associated with a user, but it doesn't seem to work:Your help is much appreciated
So, can you please let me know what I'm doing wrong or if you spot a problem? Maybe it's some known limitation?
Thank you.
Technical details
Installation
I've installed
twint
using this command from within the python3 virtual environment:Bug signature
The number of tweets returned is too small:
pip
versionpython
versionFedora's stock version of Python is used. As with all virtualenv, binaries are automatically copied in the virtual environment.
pip
packages installedOnly
twint
has a local path because it's been installed usinggit
(see above) which is the recommended way to install the latest version.twint
package detailsThe machine
The O/S
SELinux