Open jkowalleck opened 4 years ago
putting a __main__
on the bottom of each crawler file/module might be fine ...
but this would break modules that implement multiple crawlers - like Instagram which implements Tag and Profile ...
this needs some throught ... and maybe restructuring ...
restructuring idea - which would need no code change at all - all visible interfaces stay the same
nichtarasoup
imagecrawler
echo -- implement
Echo`instagram
base
-- define InstagramBase
tag
-- implement InstagramTag
(include Base)profile
-- implement InstagramProfile
(nclude Base)__init__
-- import base, tag, profile - and make them public via __all__
or do something disruptive?
instead of implementing the same __main__
again and again ...
this functionality could be done once ... and applied where needed ...
this would make the functionality available without importing sys
and json
everywhere ...
alternative:
craete an extra package nichtparasoup-imagecrawler-cli
that has the needed functionality ...
it could even work with the autoloader, so pligin-imagecrawlers would work with it right away ...
or maybe have this included as an extra command in the existing CLI ?
guess this first idea is a great one. but actually this needs a change in the internal image crawler structure. with the command calls in mind.
crawlers need to define, how they are configured... in the CLI. each crawsler acts as an own sub-command. (https://click.palletsprojects.com/en/7.x/commands/#custom-multi-commands)
:100: this means it will cause the application to have mayor changes. so a version 3 of nichtparasoup will be issued.
with the switch over to click
the following could be a solution, UNTESTED
cli: nichtparasoup imagecrawler run [OPTIONS] [NAME] ...
to get this added dynamically:
name
gets a callback that does the wollowing:
justa na idea, never tested this ...
an idea:
have the run
command gathet needed args/options anf store them in a context
see https://click.palletsprojects.com/en/7.x/commands/#custom-multi-commands
when it comes to invoking the subcommand , just do what you have to do.
option1: subcomman's click may just gathering options. invoker can be overridden - see https://click.palletsprojects.com/en/master/commands/?highlight=subcommands
option2: pass all run options to the invoked subcommand as context. subcommands are base-implementations from CaseImageCrawler that take all options and just run in circles until end is reached.
nichtparasoups image crawlers could be called as modules ala
python3 -m nichtparasoup.imagecrawler.echo '{"image_uri":"foo"}' 3
this would allow to have some images crawled without having to write actual python ...
implementation example for the end of a crawler implementation
prerequisites:
exhausted
detection implemented. see https://github.com/k4cg/nichtparasoup/issues/152#issuecomment-552347435nichtparasoup.core.ImageCollection
andnichtparasoup.core.Image