ocean-data-factory-sweden / kso

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
GNU General Public License v3.0
4 stars 12 forks source link

Dockerfile includes extra python packages but unknown what they do #225

Closed Diewertje11 closed 10 months ago

Diewertje11 commented 11 months ago

While re-creating the ci-pipeline to automatically test the notebooks, it was found that the master branch of kso points to commit a306499 of kso_utils, which is branch origin/feat/pyav-backand. The dev branch of kso points to the dev of kso_utils, commit f2ac787. (I believe these commits were made to try to fix the problem of extracting frames from movies that Emil had just before the summer holidays) The requirement file in kso_utils is different in both these commits. This creates the error as can be seen in the image below, when the notebook tests are run in a container based on the requirements in dev (commit f2ac787).

Since the tests did work when the container was build based on commit a306499 and the requirements there, these python packages are added to the dockerfile. (temporarily! we do not want these here, they should be or removed or put in the requirements.) They are not added to the requirements in dev yet, since I do not know why these packages are added and what their function is, and if we want to use them in the end.

So this issue needs to be resolved by finding out what these packages do and if we actually use them. If we do, they should be added to the kso_utils requirements in dev and master/main. If we do not want them, we need to find out why this error from the image occurs and how we can solve it.

Image

jannesgg commented 11 months ago

@Diewertje11 The packages added in a306499 were: av, more-itertools, jupyter, jupyter_contrib_nbextensions and notebook.

The first two were added to fix the issue with accessing frames from videos, and these should be kept. The more-itertools package is required by the code that uses av.

The rest were added to fix the error you encountered above "No module named 'notebook.base'". I discovered this at the end when I was trying to get the tests to run, and suspect that we used to have these packages installed before (either in one of the requirement files, or Jupyter was included in the development but not runtime image that we are using), which means it is missing at the end of the final lines of the Dockerfile, which basically tries to enable a jupyter extension but cannot find Jupyter. I think it is therefore fine for us to include these as requirements in dev, since we get the benefit of a lighter image to use. Have you managed to get the Dockerfile working without installing these additional Jupyter packages? In that case, we should just make sure that the dev version is working nicely, and rebase master with these changes instead, since master was hastily changed to accommodate these fixes.

Diewertje11 commented 10 months ago

This was resolved in commit d275c6aaec2852fd56453409bd4f18e7c797f0f9