Open BurningSquid opened 1 year ago
https://github.com/teacherc/spheri-app/tree/main/lib
No dependency files should be in the git repo, add them to the .gitignore. This can cause issues with git as well as deployment. As a general rule, only files that need to be source controlled by you should be in the git repo.
Edit: this is also going along with rule 2 at the top
@BurningSquid Thank you for your feedback! I'll take the files out ASAP. I am learning Docker as we speak. I will spend time on a few tutorials and try to use Docker to do my next deploy.
Docker is one service that allows you to containerize your code, which you can essentially think of as system abstraction. Just like we use functions to abstract logic in code, or module architecture to abstract python libraries, we can use Docker to abstract a particular application such that the system-level dependencies are purpose-built for the application. In short, Docker creates a very small operating system that stands on top of your application which provides the necessary system configuration. This means when you deploy you don't need to worry about the target resource (i.e. server, cluster, whatever) having python installed, environmental variables configured, or anything like that. All it has to have is the ability to mount a container
In order to accomplish this Docker starts with a Dockerfile which typically sits in the root directory of your project. This file describes all of the operations which need to be done before your code can be run. In your case the steps would involve the stuff you have in your readme (plus probably some more). Also in case this wasn't clear the operating system which the container "uses" is a linux distribution.
1) install python (or pull from a hosted docker image on dockerhub) 2) install pip and venv 3) copy over all code files 4) install python dependencies from requirements.txt
This is simplified but you get the gist. There may be additional linux libraries that are needed (like curl is a typical one I have in my dockerfiles).
OK now that you have a dockerfile you will need to build it. This is when docker goes through all of the instructions and executes them. The end result is a docker image which when mounted becomes a container.
Have you ever been frustrated with the amount of setup that python dev environments require? Have you ever experienced inconsistencies with dev environment setups when other people try and access your project? These kind of things get solved with dev containers.
The reason I like dev containers is because 90% of the work with them you will already have to do to get your project setup with docker anyway. This just tacks another file and bam you have a dev environment that is equivalent for any person accessing your code. Read through the dev container docs but generally it just adds additional scaffolding that allows for vscode to connect to it and open your code inside the container. Pretty amazing for development and absolutely critical for modern dev teams.
With docker under control the next thing I typically do is figure out the deployment steps I want to do and how to automate them (i.e. Continuous Integration/Continuous Deployment). There are services that make this super easy depending on your use case, some that are more complex but allow for more customization. I personally have used Azure Pipelines the most, but have done some Github workflow stuff as well. Every time you stand up a pipeline you learn something new and when done correctly it increases efficiency and quality of deployment substantiallly
The final piece I wanted to bring up is Kubernetes which is hardware level abstraction to put it simply. I highly recommend learning about how k8s (the common abbreviation) works under the hood and how you can take advantage of it. Ultimately I see kubernetes as the lifeblood of cloud architecture and deployment, and certainly applicable to anyone trying to break into the industry. This is a bit of a teaser for now as I think I've given you enough to chew on for a bit
I'm working through this tutorial now - https://www.freecodecamp.org/news/how-to-dockerize-a-flask-app/
@BurningSquid Re: Pipelines, I might use Circle CI. https://circleci.com/integrations/gcp/
@BurningSquid Re: Pipelines, I might use Circle CI. https://circleci.com/integrations/gcp/
I've heard good things about circle, especially for projects this size. Every service has its tradeoffs! Might compare to github workflows and see what the tradeoffs are if you want
I've read a few articles about the tradeoffs. Since I'm trying to find a job in the near-ish future, I think Circle CI will be helpful (it's listed a lot in job descriptions).
@BurningSquid I'm reading through GitHub Actions documentation and one point is still confusing to me. Do I need to put my Dockerfile in the GitHub repo?
@BurningSquid I'm reading through GitHub Actions documentation and one point is still confusing to me. Do I need to put my Dockerfile in the GitHub repo?
Yes Dockerfiles should always go in the git repo
@BurningSquid Thanks!
First off: Great job! This is a great project and you should be proud of it
My background is in ML engineering, which for me has included anything from more data engineering-oriented stuff to pipelines, ML deployment architecture, etc. Probably common for most ML engineers to dip their toes into a lot of buckets. I say this to just make it clear that I have experience in many areas but there are others who are probably experts in specific subsets of that and may offer different advise.
My approach to deployment follows a few general guidelines that I think are applicable to start with (in no particular order):
1) I want as much configuration for the deployment of the app to be in the repo as possible. Did you run a cli command to deploy? put it in a script or vscode task. This includes pipeline configuration, dockerfiles, etc. 2) I want the final deployed app to have only what is needed to run on the particular endpoint and nothing more. 3) I want the deployment process to have no manual steps that is dependent on developer effort. This means pipeline that is triggered based on development efforts (pull requests) 4) I want deployment to only land if the changes are stable (i.e. build checks pass) 5) I want changes to be easily reversible in case there is an issue with them
In light of these things the first thing I will suggest (without having gone through your code in extreme detail) is Docker. I'm sure you have heard of docker but please comment on what your general comfortability is with containerization as a topic. It is key to modern application architecture and I would be remiss to not highlight it.
I will use the comments in this issue to note things related to deployment as I go. Please feel free to point me in a different direction if you are unclear or want different feedback.
Cheers, Squid