openeduhub / oeh-search-etl

The Backend includes all data for the ETL process (Scrapy, Postgres, Elasticsearch)
7 stars 9 forks source link

Serlo scraper: needs an items filter #10

Open AOHPI opened 4 years ago

AOHPI commented 4 years ago

The current results of Serlo metadata are showing too much of organizational stuff, which is not relevant for any lessons. (Although the information might be interesting itself.)

Recommended fix: implement a filter based on the keywords, because thats the place where you can tell.

torsten-simon commented 3 years ago

You might simply handle this yourself in the spider by overriding the shouldImport(self, response) method and return false or true depending whether you want to import the specific object. Or did I miss something?

MRuecklCC commented 2 years ago

Can we close this as won't fix / outdated?