Open nemobis opened 2 years ago
Yeah, Twitter v2 API's response for the example tweet you provided (1512123194785898503) doesn't seem to include the link anywhere (even with all the expansions set):
{
"data":[
{
"conversation_id":"1512123194785898503",
"text":"President @vonderleyen has visited Stockholm to give the green light to Sweden's €3.3 billion recovery and resilience plan.\n\nSweden is a renewable energy pioneer. \n\nRenewables are bound to make up half of the country's energy mix by the end of the decade. ↓\n\n#NextGenerationEU",
"lang":"en",
"entities":{
"mentions":[
{
"start":10,
"end":22,
"username":"vonderleyen",
"id":"1146329871418843136"
}
],
"hashtags":[
{
"start":259,
"end":276,
"tag":"NextGenerationEU"
}
],
"annotations":[
{
"start":35,
"end":43,
"probability":0.9802,
"type":"Place",
"normalized_text":"Stockholm"
},
{
"start":72,
"end":77,
"probability":0.9972,
"type":"Place",
"normalized_text":"Sweden"
},
{
"start":125,
"end":130,
"probability":0.9456,
"type":"Place",
"normalized_text":"Sweden"
}
]
},
"public_metrics":{
"retweet_count":59,
"reply_count":25,
"like_count":197,
"quote_count":2
},
"created_at":"2022-04-07T17:40:38.000Z",
"possibly_sensitive":false,
"id":"1512123194785898503",
"source":"Twitter for Advertisers.",
"author_id":"157981564",
"context_annotations":[
{
"domain":{
"id":"10",
"name":"Person",
"description":"Named people in the world like Nelson Mandela"
},
"entity":{
"id":"1151432219002454016",
"name":"Ursula von der Leyen",
"description":"President of European Commission"
}
},
{
"domain":{
"id":"35",
"name":"Politician",
"description":"Politicians in the world, like Joe Biden"
},
"entity":{
"id":"1151432219002454016",
"name":"Ursula von der Leyen",
"description":"President of European Commission"
}
},
{
"domain":{
"id":"30",
"name":"Entities [Entity Service]",
"description":"Entity Service top level domain, every item that is in Entity Service should be in this domain"
},
"entity":{
"id":"848920371311001600",
"name":"Technology",
"description":"Technology and computing"
}
},
{
"domain":{
"id":"30",
"name":"Entities [Entity Service]",
"description":"Entity Service top level domain, every item that is in Entity Service should be in this domain"
},
"entity":{
"id":"848920371311001600",
"name":"Technology",
"description":"Technology and computing"
}
},
{
"domain":{
"id":"30",
"name":"Entities [Entity Service]",
"description":"Entity Service top level domain, every item that is in Entity Service should be in this domain"
},
"entity":{
"id":"898654185146560512",
"name":"Energy Technology",
"description":"Energy Technology"
}
}
]
}
],
"includes":{
"users":[
{
"id":"157981564",
"name":"European Commission 🇪🇺",
"username":"EU_Commission"
},
{
"id":"1146329871418843136",
"name":"Ursula von der Leyen",
"username":"vonderleyen"
}
],
"tweets":[
],
"media":[
],
"polls":[
]
},
"meta":{
"result_count":1
}
}
It looks like the only way to obtain info about the cards is using the Twitter Ads API: https://developer.twitter.com/en/docs/twitter-ads-api/creatives/guides/identifying-cards
And that would require to apply and create an additional Twitter Ads API application (with a separate token, etc.) 😖
Wow, that's nasty! No wonder nitter is forced to use the "unofficial API" aka web scraping. https://github.com/zedeus/nitter/commit/111927a21cfdebbe3b67d81f3336ae7d342b4f8b
Funnily enough, I'm able to get some card metadata with the endpoints used by guest tokens. So I've made some changes to extract the URL and media from a card: https://github.com/robertoszek/pleroma-bot/commit/e8152114b77e91243ef8d3528561cd6f94165826
You can try it out on 1.1.1rc47
:
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pleroma-bot==1.1.1rc47
Keep in mind it will only work when using guest tokens (either by omitting the twitter_token
mapping or adding guest: true
in your config).
Il 04/12/22 22:34, robertoszek ha scritto:
Keep in mind it will only work when using guest tokens (either by omitting the
twitter_token
mapping or addingguest: true
in your config).
Will the usual tokens still be used for the rest of the calls? If not I guess I should use this only for the accounts which have this issue.
No, if an user in your config is marked as "guest", it will use the guest token on all the calls associated to that user.
I've been working a bit more on it to get this feature ready for the next stable release:
get pinned tweet if using guest token
(https://github.com/robertoszek/pleroma-bot/commit/5b2983291cb0163515d027ffeda54987b438e0e2)
get poll from card if using guest token
(https://github.com/robertoszek/pleroma-bot/commit/2ade63b5d8b763030c4d6ebbe180a1b94f42d8f2)
Those commits are included in 1.1.1rc48
.
So the current limitations are listed here: https://github.com/robertoszek/pleroma-bot/blob/develop/docs/gettingstarted/beforerunning.md#guest-tokens
The inability of obtaining protected tweets makes sense, as it will never work with a guest token.
So the only main difference between using regular Twitter tokens and the guests ones is the 20 tweet limit per user, which I'm going to try to find if there's a way around it.
I figured out how to force it to paginate using guest tokens: https://github.com/robertoszek/pleroma-bot/commit/57aece6bd5f913c9c898dc12035f90c74c6cb6f8
I've managed to gather more than 4000 tweets for an user using this method, not sure if it has a hard limit (apart from hitting rate limits).
That commit is included in version 1.1.1rc49
.
Il 05/12/22 02:50, robertoszek ha scritto:
That commit is included in version
1.1.1rc49
.
I might be doing something wrong but it gives me a bunch of
✖ 2022-12-05 12:35:03,356 - pleroma_bot - ERROR - Exception occurred for user, skipping... (cli.py:707) Traceback (most recent call last): File "/home/7/federico/mastodon/bot/lib/python3.9/site-packages/pleroma_bot/cli.py", line 549, in main
user = User(user_item, config, base_path, posts_ids)
File "/home/7/federico/mastodon/bot/lib/python3.9/site-packages/pleroma_bot/cli.py", line 278, in init self._get_twitter_info()
File "/home/7/federico/mastodon/bot/lib/python3.9/site-packages/pleroma_bot/_twitter.py", line 169, in _get_twitter_info self._get_twitter_info_guest()
File "/home/7/federico/mastodon/bot/lib/python3.9/site-packages/pleroma_bot/_twitter.py", line 149, in _get_twitter_info_guest self.pinned_tweet_id = user_twitter["pinned_tweet_ids_str"][0] IndexError: list index out of range
Hmm...
Does running version 1.1.1rc52
make any difference?
Il 05/12/22 12:54, robertoszek ha scritto:
Does running version
1.1.1rc52
make any difference?
Will try.
For now I'm getting a bunch HTTP 403 (it's not protected accounts) like
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.twitter.com/1.1/statuses/show.json?id=1599735346824183808&include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1& include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&includ e_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&send_error_codes=true&simple_quoted_tweet=true&query_source=typed_query&pc=1&spelling_corrections=1&ext=mediaStats%2Chighlighted Label
Oh wait, that was with the token. The errors seem to have vanished (for now) after commenting the token in the config.
Something weird is going on... this is a quote tweet https://nitter.lacontrevoie.fr/i/status/1597718716837335040 but it was posted under the quoted account @.***_farage/109462766690805431 (which doesn't seem to have retweeted it).
I also get HTTP 404 errors for tweets which used to exist but no longer do:
On the wayback machine that redirects to https://web.archive.org/web/20220117161446/https://twitter.com/AndrzejHalicki/status/1483110270675439620 . So maybe 1483340344339091456 was a RT of 1483110270675439620 and the latter has been deleted.
Weird, 1597718716837335040 seems to only show up on the search API endpoint, doing the same query here:
doesn't seem to include it on the results. You would think when using from:account
wouldn't return quotes from other random accounts 😅 (and it only does it on the API endpoint it looks like).
I've added another pass to filter any tweets that don't originate from the mirrored user, just in case. https://github.com/robertoszek/pleroma-bot/commit/b70327d8c20a0d0239cb86f9ebc20cc8ab1d8efd
Regarding the 404's, I tried replicating on my end to no avail (reply to a deleted tweet, reply to a tweet that quotes a deleted tweet and a retweet to a deleted tweet didn't trigger it for me). I've done some changes trying to handle it nonetheless: https://github.com/robertoszek/pleroma-bot/commit/812e94b86657b3db359ad8a2629ede4c3c489d35
Both commits are included on 1.1.1rc53
. Let me know if it stills results on unhandled errors on your end.
Oh, and the weird 403's you were getting when providing the token should be fixed on 1.1.1rc54
.
There was a parameter that resulted on 403 Client is not authorized to perform this action
even when using a Elevated Twitter token (but it was fine with guest tokens):
https://github.com/robertoszek/pleroma-bot/commit/a0e01b81a13bc997a328b85db2d37f6ae6de839f
Everything is going well so far with 1.1.1rc54 (using a token, not guest tokens): it's all very fast and the process gets stuck very rarely (I don't even notice it) so the mirror is never too far behind.
The one error I see in the last day or so, apart from errors on my side, is
1 requests.exceptions.HTTPError: 503 Server Error: Service
the tweet seems fine: https://nitter.cz/MounirSatouri/status/1600535989239574530
Oh, I forgot to mention I added some retries for cases when an HTTP 503
is returned by Twitter's API:
https://github.com/robertoszek/pleroma-bot/commit/3ffe5f345cf7ed725ac806bd536ee4772ad9c569
It was included in the latest stable release, v1.2.0
.
Not much else we can do than to retry a few times, usually Twitter's API starts returning 503 if their servers are overloaded or over capacity at the time of the request.
Some fancy accounts seem to be using some Twitter feature which pleroma-bot doesn't support yet.
This is typically spotted in tweets which follow the trend of containing a mere "↓" as warning that the main content of the update is actually somewhere else, like this: https://respublicae.eu/@EU_Commission/108092396818818757 https://nitter.eu/EU_Commission/status/1512123194785898503 which is just a link to https://ec.europa.eu/commission/presscorner/detail/en/statement_22_2331 . These tweets look like just any other tweet whose main URL has been "eaten" by Twitter and shown only as attached "card", but they seem to be different.
Others are more complicated like https://respublicae.eu/@EU_Commission/108103776666586079 https://nitter.eu/EU_Commission/status/1512777762909655043 which contains a "broadcast": https://nitter.eu/i/broadcasts/1BRJjnyZoZdJw . I guess there isn't much to do about these, other than documenting it somewhere so that people make informed decisions about the
nitter
andsignature
configs.