tgxn / lemmy-explorer

Instance and Community Explorer for Lemmy
https://lemmyverse.net/
102 stars 9 forks source link

Bug: Crawler has failed for past 72 hours #182

Closed freamon closed 2 months ago

freamon commented 2 months ago

https://github.com/tgxn/lemmy-explorer/actions/workflows/publish-pages.yaml

I have, in the past, managed to break the crawler by running a Lemmy instance on a URL for long enough for the crawler to discover, and then running something different. I saw it repeatedly try to get communities/list, even though the response was 404 every time.

It's not me this time though. Maybe someone else has done the same as me (or maybe it's something completely different, of course).

tgxn commented 2 months ago

Yeah fair enough. It's designed to re-try a couple times and then stop trying after a while. If you used to have a Lemmy instance it'll get invalidated within 12 hours if it's no longer found.

The pipelines occasionally do fail, and this is expected. - As long as they finish at least once a day - I'm happy 😊

They start working again once that instance has been purged, unfortunately the link you sent isn't the specific run that failed, I assume it's this one https://github.com/tgxn/lemmy-explorer/actions/runs/8837277005/job/24265783395

In the output, we can see: image

This is due to the publish attempting to run while there's not enough Communities, as a fail safe - so we don't go and remove 16k communities from the published website randomly.