nanos / FediFetcher

FediFetcher is a tool for Mastodon that automatically fetches missing replies and posts from other fediverse instances, and adds them to your own Mastodon instance.
https://blog.thms.uk/fedifetcher?utm_source=github
MIT License
297 stars 215 forks source link

feat: lemmy #56

Closed Teqed closed 1 year ago

Teqed commented 1 year ago

Parses /comment/ and /post/ URLs for comment IDs to use with getComment to obtain the parent post_id and then uses getComments to find all related comment URLs under ap_id.

nanos commented 1 year ago

Thanks for this @Teqed

Really great work, and really appreciate it!

Can't wait for this to be ready to merge. Let me know if you need help with anything!

Teqed commented 1 year ago

@nanos Thank you for writing FediFetcher! 👍

Working: Context of posts seen in the timeline. In progress: Backfilling user profiles -- returned error is Extra data: line 1 column 4 (char 3), I will have to pick back up here later.

get_all_known_context_urls was returning None for the URL until I slightly refactored it b7ef2be (#56). I am still not sure what was happening here.

nanos commented 1 year ago

get_all_known_context_urls was returning None for the URL until I slightly refactored it b7ef2be (#56). I am still not sure what was happening here.

yeah, I must admit that I never truly understood that part 😆 it's something I just inherited from the original author, and never bothered to simplify / rewrite.

Teqed commented 1 year ago

Included are a few commits which help prevent FediFetcher from exiting ungracefully when encountering issues with unexpected types, missing properties, or unusual URLs. Not a comprehensive pass for robustness but a few spots that were helpful while writing this feature.

For future federation features, it should be noted that Pixelfed profiles don't use a subdirectory in their path, ex. https://pixelfed.social/dansup instead of something like https://pixelfed.social/u/dansup. The way the current regex is matching makes it likely to match against any currently-unmatched subdirectories instead of the user's actual name. A quick fix is to make sure Pixelfed profile matches are attempted last, though I'm sure a more sophisticated regex is possible. I've left a cautionary comment for the time being.

For federation with Kbin instances, there is the minor issue of similar profile URLs to Lemmy (ex. https://kbin.social/u/admin) that would have to be parsed separately somehow. However, reading their API documentation does not reveal to me any way to fetch comments by username. You can search for posts by magazine but AFAIK user profiles are not available as magazines. However, this may change, as they've said:

This is a very early beta version, and a lot of features are currently broken or in active development, such as federation.

Finally, included in these commits are the final pieces needed to backfill user profiles, followed communities, and "posts" from Lemmy. Testing has been done via GitHub action against my Mastodon v4.1.2+glitch instance which has content from relevant instances.

nanos commented 1 year ago

Thanks for your hard word @Teqed !

This is a fairly large PR, so I'm going to go through that with a bit of a fine toothed comb over the weekend, but it does look solid on firsts glance.

A quick fix is to make sure Pixelfed profile matches are attempted last

Personally, I think relying on a specific sequence is totally acceptable here.

Though I did think about using the /.well-known/nodeinfo endpoint to determine server software, but I'm not sure how widely implemented that is outside of mastodon either.

Teqed commented 1 year ago

Though I did think about using the /.well-known/nodeinfo endpoint to determine server software, but I'm not sure how widely implemented that is outside of mastodon either.

This is a good idea and you inspired me to do some quick research:

Making a request at https://{server}}/.well-known/nodeinfo and then accessing ["links"][0]["href"] on the JSON gets you: https://{mastodon}/nodeinfo/2.0 https://{lemmy}/nodeinfo/2.0.json https://{kbin}/nodeinfo/2.0 https://{pixelfed}/api/nodeinfo/2.0.json <-- Note: /api/ subdirectory https://{pleroma}/nodeinfo/2.0.json https://{peertube}/nodeinfo/2.0.json

A request for that JSON (2.0 schema here) gets you ["software"]["name"] containing the name of the service. Working with all six of the services listed above.

This gives me some more ideas on how to go about choosing API endpoints based on NodeInfo instead of URL parsing. I imagine that if you went this route, it'd be preferable to keep a cache of already-identified APIs so that you don't repeatedly make the same request in the same run. I may submit another PR if I find this worthwhile to do.