Closed skwerlman closed 7 years ago
Could you also edit the readme file and add it to the list in the end
naa.si is supporting rss, prefixing your search by https://nyaa.si/rss , Paring xml with only metadata may be better bet than parsing the html page
@pahakalle Done.
@wow-sweetlie The issue with using rss here is that it only lists the 75 most recent torrents, about 1/50th of what it (and most of the other crawlers) currently scrapes
Hi @skwerlman - thanks for the PR. I'll take a look this weekend and onboard it onto a new branch I've been working on for a while.
All of the importers now only scrape the first pages at most. We're no longer bringing in stuff wholesale because it caused delays in importing. The idea is that when something is uploaded Magnetissimo should pick it up almost instantly. Instead of waiting for the 100+ pages to finish processing, then start over to pick up new content.
I think RSS fetching is definitely the best approach here.
PR is here if you want to take a look @skwerlman https://github.com/sergiotapia/magnetissimo/pull/72
I am looking at reimplementing this using rss based on the new branch today; i'll open a pr with the new code when its done
This PR adds a crawler for nyaa.si, a clone of the recently deceased nyaa.se.
Site: https://nyaa.si Source: https://github.com/nyaadevs/nyaa
The dependency on
floki
was changed to version 0.17.2 because of:nth-child(n)
support inFloki.find