nanos / FediFetcher

FediFetcher is a tool for Mastodon that automatically fetches missing replies and posts from other fediverse instances, and adds them to your own Mastodon instance.
https://blog.thms.uk/fedifetcher?utm_source=github
MIT License
297 stars 215 forks source link

Enhancement: Add Fetched Posts to CalcKey/Firefish Instances #72

Closed MrHamel closed 3 days ago

MrHamel commented 11 months ago

Extending off issue #60, it would be nice now that posts can be pulled in from CalcKey/Firefish instances, to also support adding said posts into those same instances. I personally run a 1-2 person CalcKey/Firefish instance and with lack of data, sometimes it can be quite empty.

MrHamel commented 11 months ago

It turns out CalcKey/Firefish has a compatible Mastodon API, and raised an issue on the project to see if they can either make the rate limit bypassable as an admin, make such API query rate limit adjustable, or at the minimum have it report a status code of 429.

https://git.joinfirefish.org/firefish/firefish/-/issues/10628

Teqed commented 11 months ago

It turns out CalcKey/Firefish has a compatible Mastodon API, and raised an issue on the project to see if they can either make the rate limit bypassable as an admin, make such API query rate limit adjustable, or at the minimum have it report a status code of 429.

git.joinfirefish.org/firefish/firefish/-/issues/10628

I have some preliminary work done for Firefish on this branch that even uses their own API, but either way you approach it, rate limits aren't configurable in Firefish at the moment, and don't have a generous limit for authorized applications.

From my limited probing, it seems that Firefish always returns status code 401 when providing errors, and the actual error code is included with the message, ex. 'code': 'RATE_LIMIT_EXCEEDED'. Looking at the error generation in that project it's not apparent why it isn't returning the correct status codes, since that seems to be the intention.

What I have been able to accomplish is a quick hack to disable rate limiting entirely, allowing me to place a rate limiter in front of Firefish instead, and access it with FediFetcher without restriction. This works, and pulls in posts as expected.

Only the following is required:

The following would be nice:

Neither of these issues are specific to FediFetcher, and I expect especially the first bullet point to be solved at some point. If rate limit status codes were returned as expected, applications could respect the timeouts and still eventually finish.

Y0ngg4n commented 9 months ago

@Teqed i have digged into the the code, and i dont see that issue you described. Maybe they already fixed that, but firefish is returning a correct 429.

Teqed commented 9 months ago

image @Y0ngg4n Confirmed, develop branch of Firefish is returning the correct status code.

Firefish should be compatible to the extent its Mastodon-compatible API and conservative rate limits allow. A configurable rate limit would still be desirable.

Edit 06:26:10Z: After seeing some more 401 status codes I'm not positive about these results. More testing might be warranted.

likeazir commented 4 months ago

Reviving this issue again - I played with FediFetcher and Sharkey (Misskey Fork with Mastodon API) and FediFetcher seemed to work for a bit but seemed to fail unusually often and then crashed after about 50 imported notes.

@Teqed did you get anywhere on your branch? I'd be glad to help and try to make your stuff work:tm: Also, thanks for picking up the issue :)

Teqed commented 4 months ago

Reviving this issue again - I played with FediFetcher and Sharkey (Misskey Fork with Mastodon API) and FediFetcher seemed to work for a bit but seemed to fail unusually often and then crashed after about 50 imported notes.

@Teqed did you get anywhere on your branch? I'd be glad to help and try to make your stuff work:tm:

When failing, are you provided an HTTP status code (such as 429 or 401?) When crashing, does Python return an error to stdout/err?

I haven't worked with any Misskey forks recently or ever seen Sharkey's code but the point of failure was HTTP status codes for rate limitation were not being handled correctly. My branch referenced uses the native Misskey API, but FediFetcher's use of the Mastodon-compatible API is sufficient for what we're trying to accomplish. That is to say, as far as we know, no changes are required on FediFetcher's side, which only behaves as directed by the web server it's communicating with.

If your Sharkey instance is implementing native rate limits but returning anything other than 429, FediFetcher is going to fail after reaching the limit and be unable to resume.

My personal solution was to remove Misskey's native rate limiting and using my own.

likeazir commented 4 months ago

Thank you for the reply
My issue was I upped max new followings to 200, but sharkey only supports to fetch 100
Sharkey (and probably others too) seem to return an error with http 200, so its probably wise to check the response after receiving a 200 code too. (admittedly this is a very low priority issue)

error returned by sharkey for reference:

{'error': {'message': 'Invalid param.', 'code': 'INVALID_PARAM', 'id': '3d81ceae-475f-4600-b2a8-2bc116157532', 'kind': 'client', 'info': {'param': '#/properties/limit/maximum', 'reason': 'must be <= 100'}}}

and the crash

Traceback (most recent call last):
  File "/opt/fedifetcher/ShonkFetcher/find_posts.py", line 1422, in <module>
    followings = get_new_followings(arguments.server, user_id, arguments.max_followings, all_known_users)
  File "/opt/fedifetcher/ShonkFetcher/find_posts.py", line 278, in get_new_followings
    new_followings = filter_known_users(following, known_followings)
  File "/opt/fedifetcher/ShonkFetcher/find_posts.py", line 257, in filter_known_users
    return list(filter(
  File "/opt/fedifetcher/ShonkFetcher/find_posts.py", line 258, in <lambda>
    lambda user: user['acct'] not in known_users,
TypeError: string indices must be integers

I can check if there are similar issues elsewhere and create a simple PR with a fix to check if the response of the server makes sense, I already repaired this for my own deployment.

nanos commented 1 week ago

It does appear that someone has made a fork for Firefish: https://misskey.tiaplate.org/notes/9tcm2pv1f7d4o51m

nuekaze commented 5 days ago

Hello,

I uploaded my code in this commit. No guarantees it works but this is the code I run atm to fetch replies on our instance. https://github.com/nuekaze/FediFetcher/commit/1cbb1c115d88b2f212cd2e5f140cab5acd29ab7a

It was long ago since I made it so I don't quite remember how it was set up. I think the important changes was in add_context_url (line 883) where the url is changed and use post instead of get, as well as the rate-limit handling on line 1006 which is different in Firefish API.

nanos commented 3 days ago

Thanks @nuekaze . Really appreciate this.

I don't think this will be easy to implement in FediFetcher itself, in such a way that it can reliably be run against both Mastodon and Filefish instances: The maintenance overhead would be too much, especially since I myself don't run a Filefish instance, so wouldn't be able to test any future changes.

As such, I think it's best to leave this, and if someone wants to fork the project / make their own personal edits to the script, they at least now where to look.

As such I'll be closing this issue now.