twintproject / twint

An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations.
MIT License
15.71k stars 2.72k forks source link

Twint not fetching Beyond 22 august 2021 #1266

Open ahmed991 opened 3 years ago

ahmed991 commented 3 years ago

Issue Template

Please use this template!

Initial Check

No similar issue found

Make sure you've checked the following:

Command Ran

import twint import nest_asyncio nest_asyncio.apply() config = twint.Config() config.Search = "#gis" config.Limit=10000

config.Hide_output=True

config.Until = '2016-12-07'

config.Since = '2021-08-01' config.Store_object = True

twint.run.Search(config)

now you will have some tweets

tweets_as_objects = twint.output.tweets_list

Please provide the exact command ran including the username/search/code so I may reproduce the issue.

Description of Issue

Please use as much detail as possible. t.co/i4nXyn9TC5 1429340378696716291 2021-08-22 12:11:16 +0500 Opportunities for Geographic Information Systems Technician in Pacific, MO #Pacific #GIS #GISTech Apply →: https://t.co/jHgPrMjV9r https://t.co/HVEmFTlJYJ 1429326875789127680 2021-08-22 11:17:37 +0500 #GIS Representation of #Covid_19 scenario for #India for 22th August 2021,prepared by @CSIR_NEERI Total Vaccination till date 58,14,89,377 (+52,23,612) Active Cases in last 24 hrs - 30,545 #CoronaVirusUpdates @PMOIndia #coronavirus #StayHome #COVID19nsw #CovidVic #CovidVaccine https://t.co/sxZwbKnQ1C 1429310797809868800 2021-08-22 10:13:44 +0500 Any recommendations for free online courses for learning Python? #Python #GIS 1429296533405642752 2021-08-22 09:17:03 +0500 #OpenSource Web- #GIS Development Roadmap https://t.co/B9b2B6cXBL #APIs #SoftwareDevelopment #TechJunkieNews https://t.co/wvCa7DU4Yv 1429283432337526785 2021-08-22 08:24:59 +0500 @wormmaps If they would just stay up on latest #GIS technology trends, it would help a lot, and wouldn’t cost nearly as much! 1429281827164917760 2021-08-22 08:18:37 +0500 Y ahora sí que se generaban todas las etiquetas aunque tuviera valores nulos :) #QGIS #GIS 🗺️ https://t.co/ELrGTvup7F [!] No more data! Scraping will stop now. found 0 deleted tweets in this search. as we can see, it stops at 22 august 2021

Environment Details

Using Windows, Running this in Anaconda Jupyter Notebook

tassog commented 3 years ago

Same here. Since it's a scraper, I'm used to it not getting a lot of old tweets, but today I'm not getting tweets before the last seven days. When I try to use the since/until commands, it only gets a few tweets from teh same day. I'm wondering if Twint started collecting through the REST API, wich has a limit of the last seven days.

i-decrypt commented 3 years ago

Same here. twint only collecting less than 100 tweets only.

wtroisey commented 3 years ago

I'm finding the same problem

Meenu-Jain commented 3 years ago

I am also having the same issue today

brianwarehime commented 3 years ago

+1

minibug1021 commented 3 years ago

I can get tweets prior to Aug 22, but only 1-2 pages of results, and occasionally (~60%) it will return no tweets.

tassog commented 3 years ago

I can get tweets prior to Aug 22, but only 1-2 pages of results, and occasionally (~60%) it will return no tweets.

It seems that when the search query has only a few tweets, it can overcome the date limit.

mariodias commented 3 years ago

I am having the same issue today :(

hcanalesmx commented 3 years ago

Same issue :(

jformaldehydem commented 3 years ago

I'm having the same problem, except not just when looking for specific dates. The number of tweets I get is inconsistent and sometimes zero. I have implemented the changes committed in #684 but that has not resolved the problem. I'm not very proficient with python but it seems that these changes are still pointing to the exception unconditionally when the data returned is zero. Is there a way to change this?

razi9126 commented 3 years ago

same issue

WENNA-HUB commented 3 years ago

Same issue :(

dumix21 commented 3 years ago

There is a workaround, but it has a limitation to 20 tweets, at least for me. It works to retrieve tweets beyond 22nd of August, but you have to set a small interval for 'c.Since' and 'c.Until'.

e.g.: c.Since = '2021-03-21' c.Until = '2021-03-22'

Be aware that even with this one, it fails somethimes. If you set 'c.Pandas' to True, you could check if your dataframe is empty and if so, run again the configuration (twint.run.Search(c))

klojohn commented 3 years ago

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to ('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

klojohn commented 3 years ago

Is there a way to delete previous comments. It's a bit messy.

Here again:

Just uncomment (remove the '#') line 92 in the url.py file:

('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

i-decrypt commented 3 years ago

Working for me. Thanks to @klojohn

ahmed991 commented 3 years ago

Is there a way to delete previous comments. It's a bit messy.

Here again:

Just uncomment (remove the '#') line 92 in the url.py file:

('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

thanks for the solution @klojohn . But it does not seem to be working for windows.

NadiaMusafarudin commented 3 years ago

I'm having the same issue, not able to scrape the data using since and until.

razi9126 commented 3 years ago

Is the given solution not working for anyone else too? On linux

sc442 commented 3 years ago

@klojohn Great solution! Initially this is working for me on Mac OS.

Slyth3 commented 3 years ago

Hi all, this seems to be an issue around specific dates and/or tweet, but I cant confirm as the process will stop at random points for each run.

If I note the date where it stopped previously and then rerun the process with - - until
Or c.until The process will continue and stop at another "bad tweet"

In the solution provided (to comment out line 92) I tested in a few environments:

aarorauark commented 3 years ago

Hey Guys, I tried uncommenting line 92 from url.py but still no success. I tried on Jupyter and still received only handful of tweets and all tweets were dated 2010-12-04. image image

JWLMSN commented 3 years ago

@aarorauark How did you install twint? If you used pip, type "pip3 show twint" into the command line and follow the path shown under "Location". There you'll find a folder named twint and the url.py which you have to modify inside that folder.

aarorauark commented 3 years ago

Thank you @JWLMSN for getting back to me. I used both git and pip as mentioned in the link (https://github.com/twintproject/twint) and tried twint but faced the similar issue. Could you run in the CLI (twint -s "American Airlines" --since "2010-01-02" --until "2010-12-06" -0 "Test_file.csv" --csv) or run in the Jupyter the commands mentioned in my earlier post (snapshot from jupyter has the commands) and let me know if you are able to fetch all the tweets for the range? There is another issue I have opened in which twint is not returning more than 20 tweets and all tweets happened to be from the same day but also not the full set is returned? (https://github.com/twintproject/twint/issues/1276)

JWLMSN commented 3 years ago

@aarorauark I just tried a run with the parameters you mentioned and the query returns way more data beyond 2010-12-04, although I aborted the script because that would be a lot of data to pull for testing purposes. My last couple of responses were

10414711921180672 2010-12-02 20:27:15 +0200 <farecomparedeal> Sales for winter/spring from @VirginAmerica @AmericanAir & more. It's Airfare Deals Round-Up Time  http://bit.ly/e90Ukl
10411604629790720 2010-12-02 20:14:54 +0200 <asperkourt> Asper Kourt will be flying first class on American Airlines for the next 3 months . . . k, that's not quite true,...  http://fb.me/MPtFbjZ4

but my guess is it would have run all the way until the specified end date. So it's pretty safe to say your specific query is not the problem. Must be something else.

aarorauark commented 3 years ago

Thank you @JWLMSN for your time. Twint actually starts 2 days prior to the until date you specify thats what i have noticed. I have collected lots of data back in March this year and pretty big files but somehow it is broken now. Could you please share the file because ideally it would not take more than max 10 min to be honest and with this time range of just couple of months it would take only 5 min? I just want to see - (1) you are getting more than 40 odd tweets and (2) you are able to capture most of the dates because what i am seeing is if you do not specify "until" and for less famous companies or less viral search strings twint fetches data for the past 15 days only from now.

You can simply run for a month only of any year and for any company say "Facebook or Amazon" that has large user generated content on twitter. I just want to see two points that I have mentioned.

Again highly appreciate your time on this.

DavidPerea commented 2 years ago

I am also having the same problem. I work with Command Prompt (CMD), where I indicate my command: twint -u gofundme

but it only allows me to extract the tweets until September 15th. How can I solve that?

Slyth3 commented 2 years ago

Hey @DavidPerea. I'm not sure the actual solution but what you can do (which I do) is to try the initial scrape Then run the cmd line but with - - until 2021-09-14 23:55:00

See if that works or try change the time to a few hours earlier

DavidPerea commented 2 years ago

Hi @Slyth3 I have tried testing various dates but when I try to extract tweets after mid-September it tells me the following:

[!] No more data! Scraping will stop now. found 0 deleted tweets in this search.

When if there are more previous tweets. Why does this happen?

senanabs commented 2 years ago

Yep. Having the same issue.

xingos123 commented 2 years ago

Working for me. Thanks very to @klojohn

agnescameron commented 2 years ago

@klojohn 's solution works for me on mac, thankyou!

theCreativitist commented 2 years ago

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to ('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

This worked for me on Windows! Thanks!

bensilver95 commented 2 years ago

I went to try @klojohn 's solution, but that line had already been uncommented in my version of Twint. And I'm still experiencing an issue. I'm on Linux. Did anyone else see that in their version it was already uncommented?

DavidPerea commented 2 years ago

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to ('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

I work with Command Prompt (CMD), where I indicate, for example, my command: twint -u gofundme

How can I apply the solution you indicate?

Mega-Barrel commented 2 years ago

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to ('query_source', 'typed_query'), This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

This worked for me on Windows! Thanks!

Not working for me, I am using windows

7k50 commented 2 years ago

Is there a way to delete previous comments. It's a bit messy.

Here again:

Just uncomment (remove the '#') line 92 in the url.py file:

('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

Your solution worked for me as well.

I am running twint version 2.1.21 on Python 3.9.7, which is the latest version available via pip.

Now I am wondering: is there planned fix for this in the main release? I guess nothing has happened with this issue yet since twint hasn't been updated on GitHub in a while.

Is there an actively maintained fork of twint somewhere (which preferably includes this fix)? If twint is no longer actively maintained, are there any alternative software we should be aware of?

FYI: I'm running these instructions:

c = twint.Config()

#Represented command: twint -u USERNAME --images -o USERNAME.csv --csv
c.Username = "username”
c.Images = True
c.Store_csv = True
c.Output = "%s.csv" % username

twint.run.Search(c)
hpiedcoq commented 2 years ago

For a equivalent project, try snscrape :

https://github.com/JustAnotherArchivist/snscrape

Le dim. 14 nov. 2021 à 13:02, 7k50 @.***> a écrit :

Is there a way to delete previous comments. It's a bit messy.

Here again:

Just uncomment (remove the '#') line 92 in the url.py file:

('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

Your solution worked for me as well.

I am running twint version 2.1.21 on Python 3.9.7, which is the latest version available via pip.

Now I am wondering: is there planned fix for this in the main release? I guess nothing has happened with this issue yet since twint hasn't been updated on GitHub in a while.

Is there an actively maintained fork of twint somewhere (which preferably includes this fix)? If twint is no longer actively maintained, are there any alternative software we should be aware of?

FYI: I'm running these instructions:

c = twint.Config()

Represented command: twint -u USERNAME --images -o USERNAME.csv --csv

c.Username = "username”

c.Images = True

c.Store_csv = True

c.Output = "%s.csv" % username

twint.run.Search(c)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/twintproject/twint/issues/1266#issuecomment-968277164, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBIGXQ366I5WHG5U3XRIL3UL6QNPANCNFSM5DCCJCJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

christineeeeee commented 2 years ago

still having the same issue on Linux even after trying this solution...in my case, now twint only returns ~90 tweets about "apple" and "$aapl" for one date...

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to ('query_source', 'typed_query'), This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

This worked for me on Windows! Thanks!

Not working for me, I am using windows

MidasHendrik commented 2 years ago

Just uncomment (remove the '#') line 92 in the url.py file: ('query_source', 'typed_query'), This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

Where do i find/open this url.py file @klojohn? working in google colab used this for installation: !git clone --depth=1 https://github.com/twintproject/twint.git !cd /content/twint && pip3 install . -r requirements.txt !pip3 uninstall aiohttp !pip3 install aiohttp==3.7.0 import twint import nest_asyncio nest_asyncio.apply()

Abdelrahmanrezk commented 2 years ago

Ok guys. Just uncomment line 92 in the url.py file:

('query_source', 'typed_query'),

cahnge to ('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

Thanks a lot solved for me.

eminekahveci commented 2 years ago

Hey Millet, url.py'den 92. bilgiyi yorumlamayı test ettim ama yine de başarılı olamadım. Jupyter'da denedim ve hala sadece bir avuç 2010 tweet aldım ve tüm tweetler-12-04. resim resim

hello, like you, I want to receive tweets with certain hashtags with jupyter notebook, when I do the same commands in jupyternotebook, I get an error. Did you use anaconda 3.6 version, I wonder if that's why mine doesn't work. I would be glad if you could give some information.

eminekahveci commented 2 years ago

hello, like you, I want to receive tweets with certain hashtags with jupyter notebook, when I do the same commands in jupyternotebook, I get an error. Did you use anaconda 3.6 version, I wonder if that's why mine doesn't work. I would be glad if you could give some information. Ekran Alıntısı11

DenseLance commented 2 years ago

Is there a way to delete previous comments. It's a bit messy.

Here again:

Just uncomment (remove the '#') line 92 in the url.py file:

('query_source', 'typed_query'),

This solution works for PC (Linux). It does not seem to work on Raspberry Pi and I have no idea why.

This solution worked for me, and I'm using Python IDLE on Windows. Thanks @klojohn!

2spoopy4me commented 2 years ago

Fix not working for me, py 3.9.7 on mac

eamon-keane commented 2 years ago

@aarorauark How did you install twint? If you used pip, type "pip3 show twint" into the command line and follow the path shown under "Location". There you'll find a folder named twint and the url.py which you have to modify inside that folder.

This worked for me, thank you. Running on windows, installed with pip

eminekahveci commented 2 years ago

@klojohn Sir, I managed to receive tweets with a code similar to what you said, but it only gives data for a week, I think the url.py file has been changed. It wasn't exactly what you said. To be removed

DavidPerea commented 2 years ago

How would it be for Windows? Have you got it? I've been trying things for months, uninstalling and installing and I don't know what else to do.

DenseLance commented 2 years ago

@DavidPerea How would it be for Windows? Have you got it? I've been trying things for months, uninstalling and installing and I don't know what else to do.

My twint version is 2.1.21. It works fine for me on Windows after using the fix posted by @klojohn. Shows all/most tweets that I wanted to see.

DavidPerea commented 2 years ago

@DavidPerea ¿Cómo sería para Windows? ¿Lo tienes? Llevo meses probando cosas, desinstalando e instalando y ya no se que mas hacer.

Mi versión twint es 2.1.21. Funciona bien para mí en Windows después de usar la solución publicada por @klojohn . Muestra todos/la mayoría de los tweets que quería ver.

Now it works great with the solution you have indicated. It is wonderful!