Open minamotorin opened 2 years ago
The points seems to be always the same. As a workaround, you can continue scraping by using until.
I found that this part of the code in url.py
if "win" in platform:
return f'\"{date.split()[0]}\"'
sometimes makes the scraper skip some tweets when using --until "%y-%m-%d %H:%M:%S" on Windows. It starts from some hours before the specified one. Removing these lines seems to achieve better results.
@Totar Your comment has nothing to do with this issue. I opened new issue (#12), so please talk there.
This is about the issue that Twitter search still has results but not displayed. --until
doesn't matter.
twint -s keyword
will stop in middle even though there are still results if there are a lot of results.
okay no problem...Just to add something to the discussion: more you go back in time more the number of results before the scraper stops becomes lower, in the order of a day of results
The number of results seems to no change. Twint just stops suddenly. An example is shown below (I'm not sure if it is the same in other environments).
twint -s twint --until 2019-09-24 # Twint shows 20 results and stops
twint -s twint --until 2019-09-22 --limit 10 # Twint shows more results
I meant to say that (for the keywords I analyzed) the more you use a date in --until back in the past, the more the points where the scraper stops increase
@Tortar Oh, I didn't know the behavior, thanks for your reporting.
I'm having issues with the volume of tweets scraped as well. I am using both Since
and Until
.
For example, I'd search for @username with zero results but when I search in app there's plenty. And similarly, various keyword searches return far fewer results than expected. From some reading, the official API is able to get a lot more data.
It seems it's to do with Twitter not showing all tweets in a browser session. See this comment: https://github.com/JustAnotherArchivist/snscrape/issues/574#issuecomment-1287069321
Does anyone have any workarounds for this? Or is it just how it is?
Hey guys @minamotorin just sent me here, still same issue.
@batmanscode did you find any way to fix this boss ?
Hey guys @minamotorin just sent me here, still same issue.
@batmanscode did you find any way to fix this boss ?
There's no technical way to fix it, but a workaround is to run more frequent scrapes. For example if you want to scrape a 7 days of tweets, run the scraper on days 1 through 7, concatenate and remove duplicates.
from: https://github.com/twintproject/twint/issues/462#issue-461236891
I've confirmed the same behavior.