lemma-osu / naip-cnn

Modeling forest attributes from aerial imagery using CNNs
0 stars 0 forks source link

Update recommended developer workflow to use remote repository #22

Closed aazuspan closed 9 months ago

aazuspan commented 9 months ago

This (finally!) closes microsoft/vscode#8 by modifying the Dev Container config and recommended usage. After a lot of experimentation and research, it appears that Dev Containers are not currently capable of correctly syncing file events between a Windows repository and a Docker image in WSL. This leads to a number of problems when developing a local repository on Windows, like source control not auto-refreshing, changes made by formatters not being reflected in the editor, and experiment trackers failing to update.

To avoid those issues, the new "recommended" workflow is to build the Dev Container image from a cloned remote repository rather than the local copy. On the first run, this will mean using the Dev Containers: Clone Repository In Container Volume command in VS Code instead of navigating to the local repository and using Dev Containers: Reopen in Container. Once that image is built, you should be able to re-open it from the Remote Explorer tab.

image

Data and models that are too large to store in version control are still kept on the local filesystem and are mounted into the image, allowing them to be read and written from the image. The image configuration expects those files to be located at ~/naip-cnn. If that folder doesn't exist, it will be created. To migrate from the previous system without losing data, the data and models folders can be moved from the local repository folder into that folder. Any code not in the remote repository would need to be manually moved over.

@grovduck No rush, but if you get a chance to test this out and make sure the instructions make sense and work on your end, I would appreciate it! Sorry for the hassle of having to switch over - I'm hoping this is a long-term solution!

grovduck commented 9 months ago

@aazuspan, I will definitely work my way through what you've done here ... just a heads up that it might take me a couple of days to get to it.

aazuspan commented 9 months ago

Sounds good, thanks Matt!

grovduck commented 9 months ago

@aazuspan, starting to work through your instructions in CONTRIBUTING.md. Do I need to remove any current containers (e.g. from the local repo) before I try this new workflow? When I give the URL of the repo, do I need to specify the branch (remote-repo) for this test? I got an error first time through on setting up the container to the remote repo and wondered if it was related to either of these issues?

EDIT: Ah, I think it's the latter. I linked VSCode to my Github account and once I specified the naip-cnn repo, it asked me which branch to use. Obviously not something you want to change in CONTRIBUTING.md as you'll want contributors to use main more than likely. Once I've made this change, it works as expected. I also exited VSCode and reopened the container using Remote Explorer as spelled out in your instructions.

A couple of other quick questions for the container rookie:

grovduck commented 9 months ago

One other question just occurred to me. When you connect to the remote repo and you start making code changes, your "local" repo is now in the container, correct? So you aren't ever working with a local (Windows-based) clone anymore? It looks like any untracked files will remain in that local container, even when you quit VSCode. Is there a danger of unintentionally deleting a container and losing these local untracked changes? I guess it would be the same danger as removing a local repo.

aazuspan commented 9 months ago

Thanks so much for going through this and testing things out!

Do I need to remove any current containers (e.g. from the local repo) before I try this new workflow?

If your local repo was located at ~/naip-cnn, the data and models folders would be mounted into the new container, which might be unexpected, but I don't think any existing containers should cause problems.

When I give the URL of the repo, do I need to specify the branch (remote-repo) for this test?

Glad you got this figured out! Cloning from main might have caused issues downstream since the mount locations changed, but I think the container should have still built successfully since it has a devcontainer.json. When I developed this, I'm pretty sure I cloned from main, adjusted the Docker config in the container, rebuilt, and made a new branch from there. Probably not worth worrying about since you got it working, but if you knew offhand what the error message was, I'd be curious.

You say that this folder is created at ~/naip-cnn. I see that now and a simple copy/paste from File Explorer worked.

Yes, that's what I did. Does having to keep the data and models at ~/naip-cnn cause any problems for you? I don't love that solution, but I couldn't think of a good way to allow users to choose the mount point dynamically, since it's set up while building the container. If it seems problematic to you, I can try to find an alternative, like a local config file or environment variable that would override the mount point.

Because this container points to a specific branch, will this process need to be repeated to work with a different branch or if the current branch is deleted?

Once the initial container is built, my understanding is that you should be able to re-use that container regardless of what happens to the branch. Just like if you cloned a project locally from a specific branch, you can checkout other branches, pull new remote branches, etc. I believe the only meaningful result of cloning from the remote-repo branch is that the container was built with the devcontainer.json and Dockerfile located in that branch. If there was a new branch that changed the Docker config, you should be able to pull that, check it out, and run Dev Containers: Rebuild Container to get in sync with that. That's my impression at least!

I didn't go through the other notebooks as I assumed they would work as before, but happy to do that if it's helpful

Running the inference would confirm that the data and models were both mounted and accessed correctly, so that was a good test. No need to run anything else, but thanks!

When you connect to the remote repo and you start making code changes, your "local" repo is now in the container, correct? So you aren't ever working with a local (Windows-based) clone anymore? It looks like any untracked files will remain in that local container, even when you quit VSCode. Is there a danger of unintentionally deleting a container and losing these local untracked changes?

That's all correct, and it's something that makes me a little nervous as well. As you mentioned, I don't think there's any more risk than if they were files on the local system, but the fact that I can't go look at them in a folder makes them feel less real... 😬 With more experience I'll probably become confident that the changes aren't going to just disappear, but for now I might push more WIP branches just to be on the safe side.

grovduck commented 9 months ago

Glad you got this figured out! Cloning from main might have caused issues downstream since the mount locations changed, but I think the container should have still built successfully since it has a devcontainer.json. When I developed this, I'm pretty sure I cloned from main, adjusted the Docker config in the container, rebuilt, and made a new branch from there. Probably not worth worrying about since you got it working, but if you knew offhand what the error message was, I'd be curious.

Hey @aazuspan, sorry so long to return to this. I wanted to recreate this, so I removed my working dev containers and started from the beginning. Here are steps to recreate:

  1. Ran Dev Containers: Clone Repository in Container Volume... from the command palette
  2. I had connected VSCode and Github to talk to one another, so I get this message "Clone a repository from Github in a Container Volume"
  3. Select lemma-osu/naip-cnn (git address is https://github.com/lemma-osu/naip-cnn.git)
  4. This prompts for the branch name, so I select main
  5. After about 10 seconds, I get an error log.

I don't think there is anything personally identifiable in the log, but just to be cautious, I'll send it via email.

aazuspan commented 9 months ago

Thanks for the error report! This is the culprit error from your log:

docker: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /workspaces/naip-cnn.

That's caused by this line in the main branch that was responsible for keeping the local repo in sync with the Docker repo:

https://github.com/lemma-osu/naip-cnn/blob/76d41dd8facccb535c3b8b44bee45d579c5134b0/.devcontainer/devcontainer.json#L11

Because there is no localWorkspaceFolder when you build a container from a remote repo, there was nothing to mount, and Docker panicked. That mount is removed in this branch, which is why you were able to build the container successfully from this branch.

Once this is merged, it will be possible to clone a container directly from main.

grovduck commented 9 months ago

All makes total sense, @aazuspan. Thanks for digging in. Because curious minds want to know, you said that you had initially cloned from main. Was that offending line not in main at that point?

aazuspan commented 9 months ago

Was that offending line not in main at that point?

Good question - it was. I must have gotten the same build error the first time, but I was already in the process of tweaking the devcontainer config and rebuilding, so apparently I didn't pay it much attention :sweat_smile:

grovduck commented 9 months ago

Well, you're not going to believe this, but when I went back to try to clone from remote-repo, I'm now getting the same error message. I can tell in the log that I'm pulling from remote-repo but the SHA doesn't correspond to the latest commit (see line 89 in the log file - I'm assuming that 4cdf3ed is a Github SHA). Same error message as far as I can tell. Maybe I'm doing something wrong?

aazuspan commented 9 months ago

Interesting! From the log it looks like maybe it's using a cached version of the repo - I see some references to COPY . /app that shouldn't be in the updated Dockerfile. Can you try the answer suggested here?

Sorry for the hassle! I'm hoping once this is set up it should be pretty painless.

grovduck commented 9 months ago

Can you try the answer suggested here?

Nice find! Everything works once I deleted the volumes associated with naip-cnn.

I am obviously not Docker-proficient. I think I need to understand the concepts behind Docker containers and volumes a bit more ... Seems like lots of gotchas.

aazuspan commented 9 months ago

Glad that worked! I share your frustration with Docker and Dev Containers. I thought about ditching them to simplify this project a little, but spending a few hours trying to get CUDA Toolkit, cuDNN, and Tensorflow in sync on Windows made Docker build errors seem almost fun in comparison.

The good news is that I haven't had to do anything other than open my container in the Remote Explorer since I got it set up, so it should be smooth sailing from here.

aazuspan commented 9 months ago

@grovduck I'm going to go ahead and merge to keep things moving, but if you have any more Docker issues after things are set up or have any problems with the default mounting point, I'm happy to continue the discussion!

Thanks for trying this out!