prncc / steam-scraper

A pair of spiders for scraping product data and reviews from Steam.
https://intoli.com/blog/steam-scraper/
77 stars 39 forks source link

Exporting reviews by keywords #7

Closed ArtyCooL closed 6 years ago

ArtyCooL commented 6 years ago

Hi! Thank you for fixing Encoding for unicode escape characters. You freed me from some hassle 💃 I've got another issue with reviews exporting. Let's say I need to search some reviews by certain keywords. Maybe related to combat system or translation issue. Right now I have a file with those certain keywords that I'd like to use. So that in my output file I will only have the reviews that include those keywords in the text. Is there any way to create the rule in source code so it could only export reviews with these keywords?

Some keywords for example: translation, 翻訳, traducción, Übersetzung.

prncc commented 6 years ago

Glad to hear it.

Regarding filtering, if your goal is to do it live (while scraping), you could instead of returning the item just check that the text contains one your keywords first:

    # line 44 of review_spider.py:
    item = loader.load_item()
    for keyword in keywords:
        if keyword in item['text']:
            return item