Questions regarding the code

dan64 commented 1 month ago

Hello,

I have some questions regarding the released code. I decided to put all them in a single topic, to no create too many requests. The questions are the following:

It is required to install the following dependencies: py-thin-plate-spline, Pytorch-Correlation-extension. I'd like to know if they are necessary for the inference or are used for the training only.
Option "--benchmark": I noted that this option will enable/disable torch.cuda.amp.autocast(), I don't understand how amp.autocast is related to the benchmark. To get faster inference is better to set this option true or false ?
Inference: To me is not clear if the inference must run in "batch mode" (must be provided a list of image to be colored) or sequentially (it is possible to color a single image at time).
In the case it is possible to adopt the "single image" approach, it is possible to provide as reference image, the previous colored image ?
I noted that your project is based on XMem, this imply that to get the best coloring results it is better to adopt the "batch mode" ?
In the case it is adopted a "batch mode" it is possible to adopt a strategy where are colored batchs of N images ? For example supposing that I want to colorize a clip with 10000 frames. I will start by coloring the first 1000 frames by providing the reference image of first frame. Then I will provide the next 1000 frames with a new reference frame. The main coloring task is performed by the class InferenceCore, in this case, when I provide the next batch of images it is necessary to call the method InferenceCore.clear_memory() or it is better to delete and recreate the class InferenceCore ?

Thank you, Dan

Fadexboss commented 1 month ago

I am also a Windows user and I could not go to the next step for days because of the Pytorch-Correlation-extension installation. Can you provide a solution or suggestion for this?

dan64 commented 1 month ago

I am also a Windows user and I could not go to the next step for days because of the Pytorch-Correlation-extension installation. Can you provide a solution or suggestion for this?

To me the installation on Windows of Pytorch-Correlation-extension worked. It is necessary a PC with a NVIDIA GPU. You need to install Anaconda for Windows open a Anaconda Terminal and follow the instructions.

Fadexboss commented 1 month ago

Ben de bir Windows kullanıcısıyım ve Pytorch-Correlation-extension kurulumu yüzünden günlerce bir sonraki adıma geçemedim. Bunun için bir çözüm veya öneri sunabilir misiniz?

Bana göre Pytorch-Correlation-extension'ın Windows'a kurulumu işe yaradı. NVIDIA GPU'lu bir PC gerekli. Windows için Anaconda'yı kurmanız, bir Anaconda Terminali açmanız ve talimatları izlemeniz gerekiyor.

So which cuda version did you use?

dan64 commented 1 month ago

@Fadexboss please open a new issue for the topic regarding Pytorch-Correlation-extension installation.

Fadexboss commented 1 month ago

@Fadexboss please open a new issue for the topic regarding Pytorch-Correlation-extension installation.

P.S. see also this topic: ClementPinard/Pytorch-Correlation-extension#74

I tried it on Linux now but I'm getting similar errors. If you tell me your cuda and nvcc versions while you are currently running the project, it will help many people, thank you.

dan64 commented 1 month ago

I tried it on Linux now but I'm getting similar errors. If you tell me your cuda and nvcc versions while you are currently running the project, it will help many people, thank you.

Being a problem related to Pytorch-Correlation-extension you should open an issue on the site of this package. These are the steps that I used: conda create -n colormnet python=3.10 conda activate colormnet cd colormnet conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia cd ..\py-thin-plate-spline pip install -e . cd ..\Pytorch-Correlation-extension python setup.py install cd ..\colormnet pip install -r requirements.txt`

as you can see, I installed CUDA=12.1 because this is the version of CUDA SDK installed on my PC. Since the package compile from sources you must have the CUDA SDK installed on your PC. There are many sources on WEB that explain how to install CUDA developers tools. To be sure you should install also NVIDIA cuDNN, my version is 8.9.7.29 (file: cudnn-windows-x86_64-8.9.7.29_cuda12-archive.zip). This is just a library that need to be installed in the same folder of CUDA SDK, for example in my PC all the cudnn dlls are stored in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin. After the installation you should end-up having the following environment variables defined: CUDA_MODULE_LOADING=LAZY CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1 CUDA_PATH_V12_1=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1 VS140COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\Tools\

Dan

yyang181 commented 1 month ago

@Fadexboss please open a new issue for the topic regarding Pytorch-Correlation-extension installation. P.S. see also this topic: ClementPinard/Pytorch-Correlation-extension#74

I tried it on Linux now but I'm getting similar errors. If you tell me your cuda and nvcc versions while you are currently running the project, it will help many people, thank you.

Hi @Fadexboss,

For Both CUDA 11.x and CUDA 12.x Compatibility

Ensure that the CUDA version on your system aligns with the version used to build PyTorch. My local machine utilizes CUDA 11.7, as indicated by the $nvcc -V command, which reports cuda_11.7.r11.7/compiler.31442593_0. Accordingly, I've installed the PyTorch version compatible with CUDA 11.x.

CUDA 11.x Installation

To successfully install the Pytorch-Correlation-extension, it's crucial that the CUDA version detected by $nvcc -V corresponds to the CUDA version used for PyTorch compilation. For instance, if you're using CUDA 11.x, you should install a PyTorch build that supports it, such as torch==2.0.1+cu11.x. The installation command, as detailed in the ColorMNet README, will compile the Pytorch-Correlation-extension from source, tailored to your local CUDA version.

CUDA 12.x Installation

For those using CUDA 12.x, you can refer to the Colab demo as a guide on how to install the Pytorch-Correlation-extension with CUDA 12.x support.

yyang181 commented 1 month ago

Hello,

I have some questions regarding the released code. I decided to put all them in a single topic, to no create too many requests. The questions are the following:

It is required to install the following dependencies: py-thin-plate-spline, Pytorch-Correlation-extension. I'd like to know if they are necessary for the inference or are used for the training only.

Option "--benchmark": I noted that this option will enable/disable torch.cuda.amp.autocast(), I don't understand how amp.autocast is related to the benchmark. To get faster inference is better to set this option true or false ?

Inference: To me is not clear if the inference must run in "batch mode" (must be provided a list of image to be colored) or sequentially (it is possible to color a single image at time).

In the case it is possible to adopt the "single image" approach, it is possible to provide as reference image, the previous colored image ?

I noted that your project is based on XMem, this imply that to get the best coloring results it is better to adopt the "batch mode" ?

In the case it is adopted a "batch mode" it is possible to adopt a strategy where are colored batchs of N images ? For example supposing that I want to colorize a clip with 10000 frames. I will start by coloring the first 1000 frames by providing the reference image of first frame. Then I will provide the next 1000 frames with a new reference frame. The main coloring task is performed by the class InferenceCore, in this case, when I provide the next batch of images it is necessary to call the method InferenceCore.clear_memory() or it is better to delete and recreate the class InferenceCore ?

Thank you, Dan

Hi @dan64,

Thank you for your interest in our paper. Here are the clarifications you requested:

Pytorch-Correlation-extension Requirement: Both training and inference require the Pytorch-Correlation-extension. It is designed to efficiently implement the Spatial Correlation Sampler layer for the LA module as proposed in our paper. For more details, please refer to attention.py line 764. On the other hand, py-thin-plate-spline is not necessary for your setup, as it was used for augmenting the training dataset in XMem, which is not used in our ColoMNet.
Disabling torch.cuda.amp.autocast(): We have chosen to disable torch.cuda.amp.autocast() to ensure consistent FPS calculations across different GPU hardware. The impact of torch.cuda.amp.autocast() on inference speed is not uniform and is influenced by factors such as hardware support, operation types, implementation nuances, and model complexity. While it may enhance inference speed, this is not always the case and is highly dependent on the specific hardware and model in use.
Exemplar-based Method: ColorMNet operates on an exemplar-based approach, requiring an exemplar image to be provided along with either a single input image or a list of input images. Yes, ColorMNet is capable of coloring a single image, but it is essential to supply the corresponding exemplar. Please note, for inference, the batch size is fixed at 1, and we process input images one by one, with each previously colorized image being stored in a memory bank to guide subsequent colorizations, as explained in our paper.
Refer to the explanation in point 3 for further details.
Batch Mode: I suppose that neither ColorMNet nor XMem currently supports a "batch mode" for processing. The reason behind this is the necessity to construct a memory bank for each individual video clip. Processing multiple clips in a batch simultaneously could compromise the accuracy and integrity of the information maintained within the memory bank. The sequential processing approach ensures that each clip's memory bank remains consistent and reliable, which is crucial for the quality of the colorization output.
InferenceCore: While I have not personally tested it, I believe that using InferenceCore.clear_memory() should work.

Please let me know if you have further questions.

dan64 commented 1 month ago

Hi @yyang181

I thank you for your clarification, I still have some questions regarding the model/implementation:

Memory: In the example max_long_term_elements is set equal to 10000. What happen when the number of frames will be above this number ? given that in a 100min movie at 25fps there are 150000 frames, this limit will be hit 15 times, ColorMNet will be able to manage this load ?
FirstFrameIsNotExemplar: I noted that by setting this parameter equal to True provides always the best results, in the sense that if the first frame is the examplar or not the frame will be always properly colored, and the speed is almost the same. So what is the advantage of set it equal to False ?
Human Features: I expect that there are some features like the Human feature regarding the skin color, that should be already learned without the need of an exemplar, but it seems that this is not always the case, could you clarify this issue ?
Optimal frame size : There is an optimal frame size that will improve the coloring quality ? I noted that ResNet50 was trained using frames with size 224x224, there is any advantage of using bigger frames (a part the output resolution quality that can be easily upscaled even using 224x224 frames) ?

Thank you for your time and consideration, Dan

yyang181 / colormnet