Open sdave2 opened 3 years ago
Hi @sdave2, this is an interesting use case. job_spec
currently doesn't have a network attribute, but it can be useful. Not all Fiber backends need a "networkconfig, so probably the best way is to add an "extras" field to
job_specand also feed into that through some config value specified in
config.pyand
popen_fiber_spawn.py`. Feel free to create a PR for this.
For your question, it's perfectly fine to use Fiber to do jobs with side effects. The only thing you need to pay attention to is to use lower-level Process rather than Pool, as Pool has error handling logic which may mess up with your data insertion.
The only thing you need to pay attention to is to use lower-level Process rather than Pool, as Pool has error handling logic which may mess up with your data insertion.
Right, I am using processes. Also i think with Pool, you map data across a function, where as in my case, i am mapping functions across the dataset.
I will create a PR for passing in a network attribute
Also, I took it a little bit further yesterday and started running the main process inside docker, and have that process spawn fiber processes ie, docker containers. I want to avoid running anything on my local machine and encapsulate everything within docker.
The only problem I ran into was volume mapping; I am running the process as root in my container and if I spawn other docker containers, I find I am mapping /root:/:rw
. Docker doesn't like destination ending in /
, and also, I want to avoid mapping root
.
If this is also something you find useful, I can create a PR for this as well once I figure out volume mapping issue that I am having.
Feedback is welcome!
I am testing/trying out fiber from my local machine. I am looking to use fiber processes to do a side-effect job (put data into databases) and use docker as the fiber backend. For testing, i have elasticsearch and postgress running in docker containers in a docker network called
test
. I would like to pass network name as a parameter (just like the docker image) to the process running the docker container. I tried it out locally and it works for me. This is the modification i made to thedocker_backend.py
file:I am not sure how to pass the network in as a parameter. Possibly via
job_spec
?Questions:
side-effect
jobs, specifically use it and insert data into database? If i have 5 places i want to put the data in (elasticsearch, redis-stream, postgress, other-places), is it recommend to use 5 fiber processes to insert data into the respective "databases"