pystorm / streamparse

Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.
http://streamparse.readthedocs.io/
Apache License 2.0
1.5k stars 218 forks source link

Add ability to pass resource/config files #398

Open anuraguniyal opened 7 years ago

anuraguniyal commented 7 years ago

I am using a python library which is initialized using a config file and uses several other resources to initialize itself. I do not see anyway to pass those config and resources directly using "sparse submit" by making them part of jar resources. That way all dependencies are at one place, instead of user pushing such resources separately to all workers.

anuraguniyal commented 7 years ago

Any updates on this? Am I missing something trivial?

codywilbourn commented 7 years ago

The python library is doing something like an /etc/foo-library.conf lookup? In most scenarios I'd say that belongs in the realm of system configuration -- a package with a system conf file should be installed by the package/configuration manager because it's shared between multiple runs of this library.

I don't believe streamparse has any pre-submit hooks to run arbitrary fabric commands.

One option would be to symlink that conf to where the jar is unpacked, but I think that filename may be variable.

Is it possible to override the python library to specify an alternative lookup path to that file? Possibly a global or a constructor argument. You could submit that conf file in your topology and find it relative to the file being executed.

anuraguniyal commented 7 years ago

yes config file is configurable via constructor, and currently I am going to previous folder from spout location and getting that config. e.g. spout is in resources/spouts and config is in resources, I hope that will always be supported.

The main problem is that usually I won't keep config in src (security: as it contain auth details), but it will come from somewhere else during submit. We need a way to hook into jar creation so that I can point to my config/resource files

Ideally streamparse config.json should take path to such extra resources and such path should also be configurable from sparse submit

anuraguniyal commented 7 years ago

quickest workaround could be to have SPARSE_RESOURCES env variable which can point to file or folder and then can be copied to _resources in prepare_topology method

anuraguniyal commented 7 years ago

Another workaround without changing code:

Added external_resources dir , added it to .gitignore symlink required resources to external_resources and add external_resources to project.clj

there could be different symlinks for dev, stg, prd configs, and python code can load each based on an environment.