k4cg / nichtparasoup

nichtparasoup is a web-based hackspaces entertainment system. It continuously displays random images from Reddit & Pr0gramm in your web-browser. Image sources are highly customizable.
https://pypi.org/project/nichtparasoup/
41 stars 12 forks source link

IDEA: crawlers are callable modules #221

Open jkowalleck opened 4 years ago

jkowalleck commented 4 years ago

nichtparasoups image crawlers could be called as modules ala python3 -m nichtparasoup.imagecrawler.echo '{"image_uri":"foo"}' 3

this would allow to have some images crawled without having to write actual python ...

implementation example for the end of a crawler implementation

import sys
import json
if __name__ == '__main__':
    config = json.loads(str(sys.argv[1]))
    times = max(int(sys.argv[2] if len(sys.argv)>=2 else 0), 1)
    imagecrawler = MyCrawler(**config)
    for _ in range(times):
        json.dump(imagecrawler.crawl(), sys.stdout, indent=None)
        if imagecrawler.is_exhausted():
            break

prerequisites:

jkowalleck commented 4 years ago

putting a __main__ on the bottom of each crawler file/module might be fine ... but this would break modules that implement multiple crawlers - like Instagram which implements Tag and Profile ...

this needs some throught ... and maybe restructuring ...

restructuring idea - which would need no code change at all - all visible interfaces stay the same

or do something disruptive?

jkowalleck commented 4 years ago

instead of implementing the same __main__ again and again ... this functionality could be done once ... and applied where needed ...

this would make the functionality available without importing sys and json everywhere ...

jkowalleck commented 4 years ago

alternative: craete an extra package nichtparasoup-imagecrawler-cli that has the needed functionality ... it could even work with the autoloader, so pligin-imagecrawlers would work with it right away ...

or maybe have this included as an extra command in the existing CLI ?

jkowalleck commented 4 years ago

guess this first idea is a great one. but actually this needs a change in the internal image crawler structure. with the command calls in mind.

crawlers need to define, how they are configured... in the CLI. each crawsler acts as an own sub-command. (https://click.palletsprojects.com/en/7.x/commands/#custom-multi-commands)

:100: this means it will cause the application to have mayor changes. so a version 3 of nichtparasoup will be issued.

jkowalleck commented 4 years ago

with the switch over to click the following could be a solution, UNTESTED

cli: nichtparasoup imagecrawler run [OPTIONS] [NAME] ...

to get this added dynamically:

justa na idea, never tested this ...

jkowalleck commented 4 years ago

an idea: have the run command gathet needed args/options anf store them in a context see https://click.palletsprojects.com/en/7.x/commands/#custom-multi-commands when it comes to invoking the subcommand , just do what you have to do. option1: subcomman's click may just gathering options. invoker can be overridden - see https://click.palletsprojects.com/en/master/commands/?highlight=subcommands option2: pass all run options to the invoked subcommand as context. subcommands are base-implementations from CaseImageCrawler that take all options and just run in circles until end is reached.