uber / fiber

Distributed Computing for AI Made Simple
https://uber.github.io/fiber/
Apache License 2.0
1.04k stars 108 forks source link

Add network to docker containers #43

Open sdave2 opened 3 years ago

sdave2 commented 3 years ago

I am testing/trying out fiber from my local machine. I am looking to use fiber processes to do a side-effect job (put data into databases) and use docker as the fiber backend. For testing, i have elasticsearch and postgress running in docker containers in a docker network called test. I would like to pass network name as a parameter (just like the docker image) to the process running the docker container. I tried it out locally and it works for me. This is the modification i made to the docker_backend.py file:

     81         try:
     82             container = self.client.containers.run(
     83                 image,
     84                 job_spec.command,
     85                 name=job_spec.name + '-' + str(uuid.uuid4()),
     86                 volumes=volumes,
     87                 cap_add=["SYS_PTRACE"],
     88                 tty=tty,
     89                 stdin_open=stdin_open,
     **90                 network="test",**
     91                 detach=True

I am not sure how to pass the network in as a parameter. Possibly via job_spec ?

Questions:

  1. Is it recommend to use Fiber process to do side-effect jobs, specifically use it and insert data into database? If i have 5 places i want to put the data in (elasticsearch, redis-stream, postgress, other-places), is it recommend to use 5 fiber processes to insert data into the respective "databases"
calio commented 3 years ago

Hi @sdave2, this is an interesting use case. job_spec currently doesn't have a network attribute, but it can be useful. Not all Fiber backends need a "networkconfig, so probably the best way is to add an "extras" field tojob_specand also feed into that through some config value specified inconfig.pyandpopen_fiber_spawn.py`. Feel free to create a PR for this.

For your question, it's perfectly fine to use Fiber to do jobs with side effects. The only thing you need to pay attention to is to use lower-level Process rather than Pool, as Pool has error handling logic which may mess up with your data insertion.

sdave2 commented 3 years ago

The only thing you need to pay attention to is to use lower-level Process rather than Pool, as Pool has error handling logic which may mess up with your data insertion.

Right, I am using processes. Also i think with Pool, you map data across a function, where as in my case, i am mapping functions across the dataset.

I will create a PR for passing in a network attribute

Also, I took it a little bit further yesterday and started running the main process inside docker, and have that process spawn fiber processes ie, docker containers. I want to avoid running anything on my local machine and encapsulate everything within docker. The only problem I ran into was volume mapping; I am running the process as root in my container and if I spawn other docker containers, I find I am mapping /root:/:rw. Docker doesn't like destination ending in /, and also, I want to avoid mapping root. If this is also something you find useful, I can create a PR for this as well once I figure out volume mapping issue that I am having. Feedback is welcome!