Closed ToadKing closed 1 year ago
Thanks so much for this @ToadKing !
I would love even hacky support for firefish/calckey!
I must be honest, though, that I'm not sure I like relying on failures and fallbacks: I already got a few comments from owners of servers that run un-supported software (e.g. WordPress with ActivityPub) that FediFetcher users are hammering their servers for no good reason. Not sure they'd look too kindly upon us hammering their servers twice as much (once to try Mastodon, once to try Firefish) 😬
I will try this branch on my instance though, to see how it goes, because I'd still love Firefish support!
I've already discovered a shortcoming in the changes: the notes/children
API only returns immediate children notes of the parent, not any deeper threads. The web interface seems to do explicit calls for every subsequent note to check for deeper threads which my changes currently do not do. There will obviously have to be a limit on this but it's something that could be done, especially since we also get a count of replies to a note so we can only check for deeper threads when we know they're there.
I've also been working on further work into probing nodeinfo for server software for better API detection. It's still a work in progress and I'm not sure if that should go into this PR or some different one. The work is currently here: https://github.com/ToadKing/FediFetcher/tree/server-detection
That's quite annoying having to query recursively. Might need to limit to a few levels deep indeed.
nice work on the API detection too!
So I finished hooking up the rest of the server software detection stuff and found a way to get comments at depth for FireFish. It doesn't work for Misskey though but I think it's good enough for now.
The one problem is there's no real automatic way to detect the API a piece of software uses so a list in the script will need to be kept up to date. This brings up a question though: Should unknown software default to using the Mastodon API or should we just throw errors when those servers are found? Right now I do the latter but I figure I'd ask what you think is best.
This is amazing! Thanks for your work @ToadKing
I agree with your approach on erroring when we don't recognise the supported API. Gives us a chance to add to the list, and in all likelyhood, it won't support either API in that case.
As this is a big PR I'll have a closer look at this later, but it looks really good so far!
This is quite interesting: I had not expected the time it took for FediFetcher to run to be quite that much longer: Without these changes it takes about 2-3 minutes to run. On this branch, it takes about 8-11 minutes.
It's not a problem, but an interesting observation.
I wonder if we could cache the instance info on disk, in a future development, to speed it up a little.
Overall really solid work though @ToadKing! I intend to merge this later this week. Thank you!
Wow, that's odd. I would expect the extra lookups to take some extra time but not that much. However I did notice some servers (like firefish.social) take a long time to fetch the nodeinfo page and even occasionally timeout.
Is it possible to benchmark how much time is spent in get_server_info
and get_nodeinfo
? (I'm sure there is but I'm very new to Python.) If is turns out that actually is creating a bottleneck it might be worthwhile actually caching that info, at least with a timeout date for them. Making sure we actually have the most up-to-date software version isn't strictly necessary right now as long as servers don't migrate to different software with different APIs but it might become necessary in the future if we do different behavior based on software version as well.
Is it possible to benchmark how much time is spent in get_server_info and get_nodeinfo? (I'm sure there is but I'm very new to Python.)
I've just done that:
2 min, 35 sec, out of a total runtime of 5 min 41 sec was spent on get_server_info
, which probably isn't surprising, given that it requires multiple HTTP calls.
Making sure we actually have the most up-to-date software version isn't strictly necessary right now as long as servers don't migrate to different software with different API
Imho that's not really something we need to cater for: servers switching software while maintaining the same host name should be a very rare exception, if for no other reason that I don't think Mastodon itself would handle this very well...
There probably should be some timeline on how long we'd cache this, but I think several weeks would be totally acceptable.
I didn't write the explicit rate limit checks because it looks like get
(and my new post
) function automatically handle rate limits being hit. In fact, I'm not sure the other Mastodon/Lemmy functions need those checks either because of that. Am I right in thinking that?
Oops. Yes, you are correct! My bad.
As you can see I've implemented server info caching. The cache period is configurable, but defaults to 30 days.
This has brought my processing time down to 1-4 minutes again, which I'm much happier with, and if someone doesn't want to cache it, they can set --remember-hosts-for-days
to 0.
@ToadKing do you think this is ready to merge now?
Yeah, looks good to me. :+1:
Unfortunately it doesn't look good for me on my CalcKey/FireFish instance, and I'm a bit unhappy that my issue was closed without asking me if it has actually resolved my problem.
This is also after regenerating the access token for my account. Below is the my config and after that is the spewing of 401 unauthorized errors.
{
"access-token": "(removed)",
"server": "calckey.club",
"home-timeline-length": 500,
"max-followings": 160,
"from-notifications": 1
}
2023-08-05 16:19:50.738265 UTC: Error adding url https://calckey.club/api/v2/search?q=https://mastodon.social/@MineralCup/109309164407080430&resolve=true&limit=1 to server calckey.club. Status code: 401
2023-08-05 16:19:50.996215 UTC: Error adding url https://calckey.club/api/v2/search?q=https://mastodon.social/@MineralCup/109304845932641792&resolve=true&limit=1 to server calckey.club. Status code: 401
2023-08-05 16:19:51.046937 UTC: Error adding url https://calckey.club/api/v2/search?q=https://mastodon.social/@MineralCup/109304839203772384&resolve=true&limit=1 to server calckey.club. Status code: 401
2023-08-05 16:19:51.047015 UTC: Added 0 posts for user MineralCup@mastodon.social with 3 errors
Copying and pasting a URL into a new tab while logged in yields me this:
{"error":{"message":"Credential required.","code":"CREDENTIAL_REQUIRED","id":"1384574d-a912-4b81-8601-c7b1c4085df1","kind":"client"}}
@MrHamel This PR was for adding the ability to fetch posts from CalcKey/FishKey instances, not to run against one. I'm sorry that I accidentally marked this PR as fixing that. I didn't realize that issue was for both cases.
In any case you'll be running up against this, which is very annoying to say the least.
{'Server': 'nginx/1.25.1', 'Date': 'Sat, 05 Aug 2023 17:06:51 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '166', 'Connection': 'keep-alive', 'Vary': 'Origin', 'strict-transport-security': 'max-age=15552000; preload', 'Cache-Control': 'private, max-age=0, must-revalidate'}
{"error":{"message":"Rate limit exceeded. Please try again in 1 minute(s).","code":"RATE_LIMIT_EXCEEDED","id":"d5826d14-3982-4d2e-8011-b9e9f02499ef","kind":"client"}}
2023-08-05 17:06:51.752850 UTC: Error adding url https://calckey.club/api/v2/search?q=https://mastodon.social/@liroyleshed/110596080603875926&resolve=true&limit=1 to server calckey.club. Status code: 401 Unauthorized
@MrHamel can you please open an issue for this, as I’ll totally forget about this otherwise. Thanks.
Issue #72
Fixes #60
This implementation is not very robust and doesn't actually do the work to detect the server type to use (like mentioned in #60) and just relies on fallbacks. Also, in testing it seems like the
url
field for notes on Misskey/Firefish servers aren't actually filled in so they have to be created manually from the server and note ID, at least on the two servers I tested (misskey.io and calckey.social).