pip is repeatedly installing various versions of same packages

jin-eld commented 11 months ago

Hi,

it seems I have a similar issue like the one described here: https://github.com/magenta/ddsp/issues/376

I am on Fedora release 38 and I ran "pip install --upgrade ddsp" as described in the installation section. At some point pip started to print a warning and it began to download old versions of packages, for example:

INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Downloading tensorflow_datasets-4.5.0-py3-none-any.whl (4.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.2/4.2 MB 1.7 MB/s eta 0:00:00
  Downloading tensorflow_datasets-4.4.0-py3-none-any.whl (4.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.0/4.0 MB 1.7 MB/s eta 0:00:00
  Downloading tensorflow_datasets-4.3.0-py3-none-any.whl (3.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.9/3.9 MB 1.7 MB/s eta 0:00:00
  Downloading tensorflow_datasets-4.2.0-py3-none-any.whl (3.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.7/3.7 MB 1.7 MB/s eta 0:00:00
  Downloading tensorflow_datasets-4.1.0-py3-none-any.whl (3.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 1.7 MB/s eta 0:00:00
  Downloading tensorflow_datasets-4.0.1-py3-none-any.whl (3.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 1.7 MB/s eta 0:00:00
  Downloading tensorflow_datasets-4.0.0-py3-none-any.whl (3.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 1.7 MB/s eta 0:00:00
  Downloading tensorflow_datasets-3.2.1-py3-none-any.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 1.8 MB/s eta 0:00:00
  Downloading tensorflow_datasets-3.2.0-py3-none-any.whl (3.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 1.7 MB/s eta 0:00:00
  Downloading tensorflow_datasets-3.1.0-py3-none-any.whl (3.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 1.7 MB/s eta 0:00:00

Same happens for many other dependencies as well. Any idea how to get past this?

The old issue was referring to some specific and older Python version, currently I have Python 3.11.6, is this supported or am I supposed to install a specific one?

Kind regards, Jin

yk7244 commented 11 months ago

Before running the training code, try this

!pip uninstall -y keras tensorflow tensorflow-probability absl-py astunparse flatbuffers gast google-pasta grpcio h5py keras keras-preprocessing libclang numpy opt-einsum protobuf setuptools six tensorboard tensorflow-io-gcs-filesystem termcolor tf-estimator-nightly typing-extensions wrapt !pip install --disable-pip-version-check --no-cache-dir tensorflow==2.11.0 !pip install tensorflow-probability==0.15.0 !pip install keras==2.11.0

mightimatti commented 10 months ago

@yk7244 Are you successfully training your model with these versions of the listed packages?

Some associates at University and I are trying to use this lib for a project and are only getting degenerated results, despite having training for many hours, with datasets which are normalized and to all appearances perfect for the task. We tried different sample rates and different parameters, but nothing seems to be remotely usable

yk7244 commented 10 months ago

Same issue it only ran in CPU mode so it took many hours, however, I solved this by adding one more before running the codes, Go to tools-command palette-use fallback runtime version(you should scroll down a bit) and run the codes. In summary,

Change to fall back runtime version

!pip uninstall -y keras tensorflow tensorflow-probability absl-py astunparse flatbuffers gast google-pasta grpcio h5py keras keras-preprocessing libclang numpy opt-einsum protobuf setuptools six tensorboard tensorflow-io-gcs-filesystem termcolor tf-estimator-nightly typing-extensions wrapt !pip install --disable-pip-version-check --no-cache-dir tensorflow==2.11.0 !pip install tensorflow-probability==0.15.0 !pip install keras==2.11.0

!pip install --upgrade ddsp
run the original training shell(long one) It seems the problem is due to the update of CUDA version in colab default the updated CUDA is 12 but the tensorflow version that DDSP supports uses 11. So you should use fall back runtime version to go back to the older version

This should work

@mightimatti

mightimatti commented 10 months ago

@yk7244 Thank you, I will give it a try with these specific versions. I wish there were a known, good Dataset available to test if the model I'm training reproduces their results. There are just so many variables in ML training, that it's really hard to troubleshoot, if you're not even confident your techstack is performing as expected.

yk7244 commented 10 months ago

@mightimatti Once you use in GPU mode (check that you set the GPU runtime) it only takes 10-20 min. The environment setting for AI, Machine learning is really pain in the ass.... Every version of the requirements should be met... However, in my opinion, that's the beauty of these AI technology since we can expect or anticipate very little of the result. It's like a surprise :)

yk7244 commented 10 months ago

@mightimatti I modified the comment a bit for the solution. Please check

jin-eld commented 10 months ago

@yk7244 I switched to a venv setup, so uninstalling anything is pointless in my case - I start with a clean environment. I went back to Python 3.7 and I think I did tune the requirements.txt a bit (will check tonight and report back). In the end I manage to install it in the 3.7 venv.

By the way, the colab did not work for me, it errored out at some point, but I did not follow up on it. I was hoping to download the pretarined models from the colab space...

I was also wondering if there are any existing pretrained models to use with ddsp as I could not find anything to download?

jin-eld commented 10 months ago

The environment setting for AI, Machine learning is really pain in the ass.... Every version of the requirements should be met... However, in my opinion, that's the beauty of these AI technology since we can expect or anticipate very little of the result. It's like a surprise :)

@yk7244 I totally agree about the PITA point, it's literally a mess in pretty much every AI/ML project. I think those tons of incompatible Python dependencies add a lot to this mess and on top of that - in my case: throw ROCm into the mix, that's some fun on top of it all, especially for projects which were not written with ROCm support in mind.

I think another aspect that contributes to it: the AI/ML folks are primarily scientists and while they are really good at the scientific part of it, I see a lack of software engineering skills in many projects which makes the code difficult to understand and maintain. Often code is thrown out to show that a paper works, but it's then not properly picked up by actual SW engineers who would sit down and rewrite it into a nicely designed software. I hope this issue to settle at some point when different parts of the community start to collaborate a bit better and of course one has to have time to actually do this :)

I come from the SW side of things and to be fair, I did not contribute much either at this point, partially because a lack of time, but also simply because I did not yet find a project to focus on. Trying out those many, many exciting things with varying success and moving on to the next, hoping to get better results, just so many AI projects are out there now :)

mightimatti commented 10 months ago

Hi, so I actually came up with another solution whcih should be somewhat future proof. This is assuming you have access to hardware to run this on. I wrote a dockerfile that downloads an old docker-image of collab, which features the supported runtime and installs the neccessary dependencies. One you run this you can navigate to collab and select to connect to a local runtime. This is the dockerfile:

FROM europe-docker.pkg.dev/colab-images/public/runtime:release-colab_20230921-060057_RC00
RUN pip uninstall -y keras tensorflow tensorflow-probability absl-py astunparse flatbuffers gast google-pasta grpcio h5py keras keras-preprocessing libclang numpy opt-einsum protobuf setuptools six tensorboard tensorflow-io-gcs-filesystem termcolor tf-estimator-nightly typing-extensions wrapt
RUN pip install --disable-pip-version-check --no-cache-dir tensorflow==2.11.0
RUN pip install tensorflow-probability==0.15.0
RUN pip install keras==2.11.0
RUN pip install crepe==0.0.12
RUN pip install ddsp[data_preparation]==3.6.0

I saved this to a file called Dockerfile and then ran

docker build -t colab_local - <Dockerfile This will take a LONG time, as the Dockerimage is 12GB.

after this, and assuming you have all your NVIDIA drivers and permissions set up to allow docker to access your GPU(I hadn't!), you can spin up a docker instance running collab which utilizes your local GPUs like this.

docker run -p 127.0.0.1:9000:8080 --gpus all colab_local

Within this collab instance you should be able to run everything as intended by the notebooks. If you're storing your checkpoints etc to disk, you might need to map a docker directory to a host directory.

magenta / ddsp

pip is repeatedly installing various versions of same packages #510