Closed Xiangshougudu closed 5 months ago
Dear @Xiangshougudu,
Thank you for your interest in DeepOF!
We indeed detected an issue with GPU usage in Windows that we're currently troubleshooting. I'll get back to you as soon as possible within this week with a potential solution (we're about to release a new patched version).
Regarding the implementation of the ELBO minimization in our take on VaDE, I agree the code may be a bit confusing. Let's split the loss between reconstruction and KL divergence between prior and posterior:
deepof/models.py
, as part of the train_step()
method in the VaDE
class.# Compute reconstruction loss
reconstruction_loss = -tf.reduce_mean(reconstructions.log_prob(seq_inputs))
total_loss += reconstruction_loss
total_loss
object comes from retrieving a set of losses that are computed within the model itself:total_loss = sum(self.vade.losses)
In particular, KL Divergence between the multimodal prior and posterior is computed within the GaussianMixtureLatent
class, using tf.keras.Layer.add_loss(). You can find the exact lines here(1251-1286).
Bear in mind that the model does not use the standard implementation of ELBO, but rather a multi-modal version based on VaDE. Please do not hesitate to ask if you have any questions!
Best wishes, and we'll keep you posted with the GPU fix in Windows, Lucas
Hello,
Thank you for the update.
I tried to recreate the conda environment using your updated content using the follow steps: conda create -n deepof python=3.9 pip install -r requirements.txt
When I run the code below, I find that I still can't call the GPU import tensorflow as tf print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Num GPUs Available: 0
When I tried to execute the deepof_unsupervised_tutorial demo, I received the following error:
2024-04-18 23:40:36.730028: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-18 23:40:36.732187: I tensorflow/core/profiler/lib/profiler_session.cc:101] Profiler session initializing.
2024-04-18 23:40:36.732272: I tensorflow/core/profiler/lib/profiler_session.cc:116] Profiler session started.
2024-04-18 23:40:36.732390: I tensorflow/core/profiler/lib/profiler_session.cc:128] Profiler session tear down.
The initializer GlorotUniform is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
2024-04-18 23:40:51.328344: W tensorflow/core/framework/dataset.cc:769] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
2024-04-18 23:41:19.342734: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at summary_kernels.cc:65 : NOT_FOUND: Failed to create a NewWriteableFile: E:\code\behaviors\deepof-0.6.1\test_single_topview\deepof_tutorial_project_test\Trained_models\fit\deepof_unsupervised_VaDE_recurrent_encodings_input_type=coords_kmeans_loss=0.0_encoding=4_k=10_20240418-234036\train/events.out.tfevents.1713454879.DESKTOP-D42M5T7.3336.0.v2 : ϵͳ�Ҳ���ָ����·����
; No such process
Creating writable file E:\code\behaviors\deepof-0.6.1\test_single_topview\deepof_tutorial_project_test\Trained_models\fit\deepof_unsupervised_VaDE_recurrent_encodings_input_type=coords_kmeans_loss=0.0_encoding=4_k=10_20240418-234036\train/events.out.tfevents.1713454879.DESKTOP-D42M5T7.3336.0.v2
Could not initialize events writer.
Traceback (most recent call last):
File "E:\code\behaviors\deepof-0.6.1\deepof_unsupervised.py", line 60, in
Dear @Xiangshougudu,
Thank you for the follow-up! We released a new version (0.6.1) to PyPI a few days ago, but we still could not test it with a Windows GPU machine (that's why I hadn't got back to you yet).
However, it would indeed be wonderful if you can try it out! To install it, however, you should avoid using the requirements.txt
file. You can follow the instructions in our documentation, by either:
conda create -n deepof python=3.9
conda activate deepof
pip install deepof
conda create -n deepof python=3.9
conda activate deepof
conda install poetry
git clone https://github.com/mlfpm/deepof.git
cd deepof
poetry install
# download the latest available image
docker pull lucasmiranda42/deepof:latest
# run the image in interactive mode, enabling you to open python and import deepof
docker run -it lucasmiranda42/deepof
Please let us know if you succeed with any of them! And we'll update the thread as soon as we manage to test on a Windows GPU.
Best wishes, and thank you very much once again for your interest, Lucas
Thanks for your reply, I will try it again.
Dear @Xiangshougudu,
The patch indeed seems to have fixes the issue on Windows. I will close the thread for now, but of course feel free to reopen if you still run into trouble!
Best, Lucas
Dear @lucasmiranda42 ,
Thank you very much for your reply again. Can you successfully invoke GPU on windows? Is it on the conda? In fact, it still doesn't call the GPU on my windows. When I configured the DeepOF 0.6.1 environment earlier, I found that I needed to manually install tensorflow-gpu. However, the TensorFlowwebsite states that TensorFlow-GPU is already integrated into TensorFlow, but I still tried to install TensorFlow-GPU. It turns out that its dependencies conflict with TensorFlow, so I'm waiting for the TensorFlow-GPU to update to a version that doesn't conflict between them. I'll try to configure it again later, and let me know if you have a new way to configure your environment. Thank you again!
Best wishes
Hello, Thank you so much for providing such a powerful tool. Following your tips, I created deepof's virtual environment using conda on windows11.
Unfortunately, when I install the Tensorflow-GPU version, it always tells me that there are some packages that conflict with each other, which prevents my tensorflow from calling the GPU. I don't know how to solve it. Can you help me? Or if you could provide the full requirements.txt or environment.yml so I can completely replicate your environment.
On the other hand, you wrote in the paper, DeepOF minimizes ELBO. But I can't find the corresponding loss function in your code, can you help me point them out?
Best wishes