Closed kanihal closed 5 years ago
@kanihal This seems like an very useful addition, but I am a little bit surprised because I have never seen the 'data-retweet-id' attribute before. Can you give an example of this in practice? (twitter website)
Consider twitter page of Stanford NLP group - https://twitter.com/stanfordnlp
Here they have retweeted a tweet with id=1139508286418386944 from Victor Zhong (hllo_wrld
). link - https://twitter.com/hllo_wrld/status/1139508286418386944
On stanfordnlp
page, if you search for hllo_wrld
and inspect that retweeted element (currently 2nd tweet from the top on their timeline), You can see div class="tweet ...
<div class="tweet js-stream-tweet js-actionable-tweet js-profile-popup-actionable dismissible-content original-tweet js-original-tweet tweet-has-context has-cards has-content MemexAdded" data-tweet-id="1139508286418386944" data-item-id="1139508286418386944" data-permalink-path="/hllo_wrld/status/1139508286418386944" data-conversation-id="1139508286418386944" data-tweet-nonce="1139508286418386944-f3556e9a-126c-4200-b398-3271a5c367f4" data-tweet-stat-initialized="true" data-retweet-id="1140033153106509825" data-retweeter="stanfordnlp" data-screen-name="hllo_wrld" data-name="Victor Zhong" data-user-id="257287707" data-you-follow="false" data-follows-you="false" data-you-block="false" data-tagged="hllo_wrld LukeZettlemoyer uwnlp" data-reply-to-users-json="[{"id_str":"257287707","screen_name":"hllo_wrld","name":"Victor Zhong","emojified_name":{"text":"Victor Zhong","emojified_text_as_html":"Victor Zhong"}},{"id_str":"118263124","screen_name":"stanfordnlp","name":"Stanford NLP Group","emojified_name":{"text":"Stanford NLP Group","emojified_text_as_html":"Stanford NLP Group"}},{"id_str":"3741979273","screen_name":"LukeZettlemoyer","name":"Luke Zettlemoyer","emojified_name":{"text":"Luke Zettlemoyer","emojified_text_as_html":"Luke Zettlemoyer"}},{"id_str":"3716745856","screen_name":"uwnlp","name":"UW NLP","emojified_name":{"text":"UW NLP","emojified_text_as_html":"UW NLP"}}]" data-disclosure-type="" data-has-cards="true">
Here
hllo_wrld
has a tweet id data-tweet-id=1139508286418386944
stanfordnlp
of that original tweet will be having different id i.e. data-retweet-id=1140033153106509825
. If you open the retweet-id link 1140033153106509825, it'll redirect you to the original tweet1139508286418386944Use cases:
@kanihal Thank you for the information. I think this will be a very useful addition to twitterscraper.
The reason I have not merged it yet is because it seems to only work in addition with the --user
argument, i.e. when you are scraping tweets from an user profile page.
When you are searching for tweets in the regular way the additional information regarding the retweeter is not provided and it will result in the output containing a lot of / only empty values for these additional fields.
So I am thinking that it would be better if these additional values regarding retweets should only be provided in combination with the --user
argument. But is it better to merge this PR for now and make the changes in a new PR or incorporate these changes in this PR? What do you think?
Yes, It makes sense to process retweet
related information only with --user
option.
I can send additional PR that does this.
<div>
element with classnametweet
. This might run a tiny bit faster.Additional commits
has_more_items
returned by twitter to stop the scraping operation on user profile