This project is a PITA to run on a real docker host

Zipties commented 1 year ago

⚠️ Please check that this feature request hasn't been suggested before.

[X] I searched previous Ideas in Discussions didn't find any similar feature requests.
[X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

With numerous .env files, the compose file, hard-coded ports in multiple scripts, and "localhost" links throughout, this project is a ball of spaghetti to get running on a remote system. To make matters worse uses common ports such as 3000, 8000, and 8080, and changing requires crawling the project. Ports that do not need to be accessed on the front end should not be exposed which will improve security and maintainability, with the rest of the ports being kept inside the docker network.

✔️ Solution

Use docker DNS instead of "localhost" and variables instead of ports e.g. next:$port. Load all .env files from the same volume and use variables through the env files to make changing ports easier.

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this feature has not been requested yet.
[X] I have provided enough information for the maintainers to understand and evaluate this request.

asim-shrestha commented 1 year ago

Hey @Zipties your title perfectly encapsulates the current setup 😅

Your solution sounds pretty perfect, if you're up to work on this that would be awesome

Zipties commented 1 year ago

Hey @Zipties your title perfectly encapsulates the current setup 😅

Your solution sounds pretty perfect, if you're up to work on this that would be awesome

I've never worked on a git project before, but wouldn't mind at all. It would help to know if there were any specific reasons things were done the way they are today. For example, could we combine the various .env files into a single file?

Some things are more on the web dev side which I just don't have exposure to. There are user-facing links that use "localhost" (such as the sign-in) which should be reworked. I'm sure docker has some kind of "host" variable that will redirect it to the external IP

asim-shrestha commented 1 year ago

Mostly we just hacked things together for the sake of speed. Combining an env file would be ideal, I think I was originally having trouble accessing a file from the parent dir of the dockerfile but don't really remember.

Some links I think may just require localhost (Example, a user would go to localhost to open the site. The site probably only has access to localhost and not the internal docker networks - but I might be wrong) Shoot me an email (asim@reworkd.ai) or DM me on discord and I can try to help get you setup if you need.

jasangill1 commented 1 year ago

Hello @Zipties any luck on this issue?? Feel free to message any of use if you need a hand with getting it setup !!!

asim-shrestha commented 1 year ago

To add to that, seems like this may be causing issues for other people with their server setups (https://github.com/reworkd/AgentGPT/issues/629#issuecomment-1618872674)

Would be awesome to help get a hand :)

Superkikim commented 1 year ago

Mostly we just hacked things together for the sake of speed. Combining an env file would be ideal, I think I was originally having trouble accessing a file from the parent dir of the dockerfile but don't really remember.

My two cents: Nutanix had, after about 7 years (Yeah I know... It's faaaar away) to completely refactor the whole code because people "hacked things together for the sake of speed" from the beginning and nobody had the time to do so. It cost a hell lot of money and resources (and time) to do that and the revenue and backlog have been highly impacted for two years. If reworkd is an ambitious startup with a vision, I would strongly suggest to add this kind of issues in sprints along with the rest. Or you might end up having to do the same later.

jasangill1 commented 1 year ago

I would love to help you get started @Zipties and give a helping hand !!!

JPDucky commented 1 year ago

I've been working on this issue specifically over the weekend and there are a few things I have thoughts about: First and foremost, we need separate branches for testing/development and stable/production. This project is moving extremely quickly and pushing every change to main is going to breaks things. We need to have a stable branch, not just tags, so that we can keep track of what works and what doesn't, and fix what doesn't while keeping the main project working.

Second- regarding the docker deployment. There are a couple things I have noticed that seem to fail. One of them is the above mentioned "tools", where when deploying via docker on a remote host, tools seems to want to connect to localhost:8000.

I've narrowed it down to the environment variables not being parsed correctly, and somewhere between them being pulled in by next/src/env/shema.mjs and next/src/services/fetch-utils.ts, making changes to NEXT_PUBLIC_BACKEND_URL and NEXT_PUBLIC_VERCEL_URL in the .env file do not seem to be picked up. My thoughts are that this is a routing issue, as I was unable to force it with an nginx proxy because the web client still wanted to look for /api/* at localhost:8000.

I will probably create a separate issue and link it to here. I thought I was going crazy after I updated this weekend not realizing that docker-compose had been deprecated for docker compose, so that held me back for a minute. I may also create a separate issue regarding a solution for getting things running smoothly with docker in a non-bandaid way. I'll keep y'all posted.

jasangill1 commented 1 year ago

Hello @JPDucky A bunch of great ideas and suggestions!!! Please do keep us posted!!!

JPDucky commented 1 year ago

I've been busy this week but I"ve also been doing some research into how we can best solve this, and I wanted to give an update so I'm not silent on it:

One thing I know for sure is that we need an nginx wrapper. I solved a similar problem over at SuperAGI where the webpack-hmr was causing issues of a silmilar nature, and deployment in general was suffering issues of fragmented routing caused by each of the containers thinking they each were 'localhost'. One of the things I made sure to do there was remove the port mappings from the individual services in the compose file, except for nginx, as nginx there will sort-of act as 'localhost' for all of the services within the compose file due to how docker networking works.

The tldr on this is that because nginx is proxying everything, and that it can see all the ports on the other services - due to it also being on the same network - (because port mappings on a service within a compose file expose the port on the host, not the container (the container is already by default exposing its ports to the docker network it is running on)). This means that we can use the service's names (not the container_name, but the first-indent service-name) as the hostname for each service, i.e. http://next for next instead of localhost:3000, http://platform for platform instead of localhost:8000, etc., etc. for all services involved.

It also means that we need to remove the variable container_name from each service, as 1) it isn't required/necessary and 2) it can cause namespace conflicts on a machine already running many docker containers. For example, we have db, and so do many other projects. If those projects are on the default docker bridge network (the default network created/used for services that don't specifically specify a network), then it can cause conflicts with other projects, including but not limited to: 1) issues using docker compose up/down, 2) the incorrect database being accessed, and/or rejected, and 3) we need to follow best-practices and try and keep everything contained within its own scope as much as possible.

This was something I actually noticed when I was in the lazydocker interface (awesome project btw, highly recommend). I was trying to up/down the stack but db always seemed to stay up, and then I realized that it was taking out wikijs, and soon after I realized that it was also causing conflicts with mattermost, as it also uses a standard/default port mapping (8000 or 8080, can't remember which).

A few other things I noticed were that the ./setup.sh script is failing, and that we have redundant volume mappings and COPY operations in the compose file and Dockerfile, respectively.

I haven't had much time to dig into the setup script, but when I made it I had built in handling for when the user chose docker-compose and they already had a port allocated to the default one we use, then it would try again on another port (+1 above default). We will need to put this in place for the nginx container once it is up. I was unable to get the setup script to work on a git reset --hard, but I was able to get it to work on a different machine with a fresh clone, except the entry for the serper api would not accept my key. We should also consider the possibility of using a different program for the setup script, like ncurses, to account for the variability in behavior between different shells.

As for the volume mappings, under service next we have both ./next/.env:/next/.en:/next/, which is redundant- passing in just the next directory should also pass all it's contents, including the .env files. I haven't gone back through the commits to see when it was added, so I don't know if it was put there for a specific reason or not, but my hunch is that we can get rid of it. As for the other 2 volumes, /next/node_modules and /next/.next, my understanding is that they are there as a cache for development, but if that's the case we should have them commented out by default (with a disclaimer), as they would only be necessary when adding the --build flag to docker compose up, which is only necessary when making changes to the source files (anything in ./next, ./platform, ./db, etc.), because when docker builds it's images, by default it stores/caches those images so that next time it runs it doesn't have to build them again. So for non-development instances, docker compose up --build is only necessary for the first run and after an update (like git pull), and all subsequent launches could be done with just docker compose up so it doesn't need to rebuild.

Regarding the Dockerfile: within the entrypoint.sh script it calls, it copies in the .env file, which should already be available if we have it in the folder that we use for our volume, and we should avoid copying anything unnecessarily if we can avoid it.

In the entrypoint.sh script, we are calling wait-for-db.sh, which takes the hardcoded arguments db and 3307, and if the db service has been re-named, next never sees that db is actually up and thus never connects to it. We should have some logic implemented for getting the service name and port in case the user changes them. But now that I think about it, that may also not be necessary with the nginx container. I'm just trying to get all my thoughts out before I lose them.

Regarding our git workflow: I suggest that we implement Git Flow, with a stable master branch and a development branch- where all PR's are done on development, and once the PR's are tested they get pushed to master- or a dev/test/master workflow where we can make sure the PR's work on dev, test that they work with everything else on test, and anything that passes on test can get pushed to master. We should discuss which solutions would be best for us, and using one of these will definitely help us avoid all the spaghetti code and whatnot, and will help us catch errors quicker and implement fixes better. Here is a good Medium article discussing some of the workflows.

I've got to run for just a few, I'll be back in a couple of hours and I'll create separate issues for each of these so we can get to work on them.

yhyu13 commented 11 months ago

Here is my changes for local startup for v1.0.0, adding startup scripts for db, next and platform, plus fixing the agent_db host address just replace by 0.0.0.0,

https://github.com/yhyu13/AgentGPT/commit/09e1e0447b1a5619e3a333668c1bb251d1007db3

mentioned in this thread https://github.com/reworkd/AgentGPT/issues/762#issuecomment-1836476545

I hope this save your time for being less PITA

reworkd / AgentGPT