myConsciousness / atproto-pds-search

This project automatically crawls and visualizes the atproto PDS endpoints indexed in the PLC directory.
MIT License
10 stars 1 forks source link

What are the criteria for a PDS to be listed in the repo's README? #4

Open kjarex opened 1 year ago

kjarex commented 1 year ago

I'm just wondering, by which criteria a PDS is listed in the README? I'm asking because I'm missing schnitzel-mit-pommes.de in the list, while atscan.net mentions it.

It's not that I need it to be listed, I'm more interested if this PDS might not deliver/ respond with some required data or misbehaves in any other way!?

Should it indeed misbehave/ lack any necessary information, I would be interested in the details.

Thanks!

myConsciousness commented 1 year ago

Hi @kjarex , I just noticed your issue here!

It's based on plc.directory related services, and these are targets right now.

https://github.com/myConsciousness/atproto-pds-search/blob/f9e89e1cc1d60daa5ae47ccfd3fe9dfd788f6c6f/lib/atproto_pds_crawler.dart#L9-L12

kjarex commented 1 year ago

Thanks.

Do you have any idea why schnitzel-mit-pommes.de isn't recognised? It has dids on plc.diretory, for example did:plc:4kg4ebnn6x3wxnexoxxtuidl.

Just again for clarification: my intention is not to get it on the list (as the pds is not really used currently anyway), but just to rule out any technical misbehaviour of the pds 😀The same reason why it gets skipped here, might as well create issues with federation later.

fatihsever commented 11 months ago

The crawler started indexing on July 29 according to the GitHub Actions workflow (see the created_at field). However, your DID was registered to the directory on May 12 according to the PLC API (see the createdAt field).

I think DIDs which were registered before the June 29 are not crawled. FYI @myConsciousness

mary-ext commented 4 months ago

I'm seeing a whole lot more PDS instances than what this repo seems to suggest https://github.com/mary-ext/atproto-scraping

I don't think the scraper is actually working, like, at all. It doesn't even list the US West-based PDS instances that Bluesky now has (*.us-west.host.bsky.network)

mary-ext commented 4 months ago

I checked the data/index.json file, it seems to have just stopped crawling entirely image

The jobs being scheduled every hour with each job taking an hour to do means that many jobs have been failing left and right too.

image

I don't think it's useful to check every hour.