Open pushshift opened 5 years ago
Here is the equivalent command in twint:
twint -s "magic max_id:562621024957366272"
What I suggest is that we make a flag for this that will automatically convert YYYY-MM-DD HH:MM:SS to the correct max_id. This will allow people to target very specific parts of the timeline down to the second.
So something like:
twint -s "magic" --precise_until 2015-02-03 14:37:00
We could even just replace the current until and since with this more precise method. I believe Twitter will allow min_id or max_id but not both -- but that shouldn't really be an issue. This will be a HUGE help to get around a lot of problems with since and until being so inaccurate.
That would be an amazing feature!
Confirmed that both max_id
and min_id
doesn't work, but we actually need max_id
only in the first request, and then just place min_id
in the further requests until the "limit" is not reached
@pielco11
Confirmed that both
max_id
andmin_id
doesn't work, but we actually needmax_id
only in the first request, and then just placemin_id
in the further requests until the "limit" is not reached
Just to clarify, are these statements correct?
On the first request, the value of init
passed to url.Search will always be -1
If the value of init
is -1
and config.Since
is defined then on the first request max_id
needs to be set.
If the value init
is not -1
and config.Until
is set then min_id
needs to be set based on config.Until
until Limit
is reached or min_id
is in the feed.
If the value of init
is not -1
and config.Until
is not set, then neither min_id
nor max_id
are required in subsequent requests (i.e. max_position
as defined by init
is controlling the feed at this point up to Limit
tweets returned or no more data is encountered) .
If the value of init
is -1
and config.Until
is defined then on the first request request min_id
needs to be set based on config.Until
.
On subsequent requests, min_id
should be set based on config.Until
.
Requests should continue up until min_id
is in the feed or Limit
has been reached.
As we all know, the Twitter search feature only allows the date for since and until which is a huge pain in the ass for recovering at specific points. However, you can pass max_id and min_id to Twitter search. Here is an example: https://twitter.com/search?q=truck%20max_id%3A1145726304651747328&src=typed_query&f=live
You're probably thinking, "That's great, but those aren't datetimes." Well, the datetime of tweets made with the snowflake implementation are backed into the ids! So you can translate a datetime object to a twitter id and use that id as a boundary marker to simulate much more precise since and until flags.
Here's the code to convert a Twitter ID to microsecond epoch:
(tweet_id >> 22) + 1288834974657 -- This gives the millisecond epoch of when the tweet was created.
Now here's the magical one:
(millisecond_epoch - 1288834974657) << 22 = tweet id
So let's say we want to get Tweets that have the term "magic" in them from February 3, 2015 at 9:37 am eastern standard time. First, we need to convert that date to millisecond epoch. That translates to 1422974220 epoch for the start of the minute and 1422974280 for the end of the minute (60 seconds). We multiply them by 1,000 and use the formula above to get the min_id and max_id boundaries:
min_id = (1422974220000 - 1288834974657) << 22 = 562620773299126272 max_id = (1422974280000 - 1288834974657) << 22 = 562621024957366272
Now let's test this on Twitter:
https://twitter.com/search?q=the%20max_id%3A562621024957366272&src=typed_query&f=live
It looks like it has problems with both min_id and max_id at once, but max_id does indeed show tweets with "magic" in it starting exactly at 2015-02-03 9:37 am Eastern time.
This should open the door to a lot of really cool possibilities including more exact timeline targeting for search and resume capabilities since we can resume at a specific time.