yuhengliu02 / pyramid-discrete-diffusion

Official implementation of paper "Pyramid Diffusion for Fine 3D Large Scene Generation" (ECCV 2024 Oral)
https://yuheng.ink/project-page/pyramid-discrete-diffusion/
MIT License
106 stars 8 forks source link

Hello, how can I perform conditional generation? #2

Closed fxlong closed 1 month ago

fxlong commented 2 months ago

Hello, how can I perform conditional generation?

yuhengliu02 commented 2 months ago

Hi,

To perform conditional generation, you can modify the configuration file I’ve provided, which is located in the configs folder. Since conditional generation will use data from the validation set as a condition, we only focus on generation during stages $S_1$ to $S_2$ and $S_2$ to $S_3$.

For example, if you want the condition to come from 64*64*8 data in the validation set, and wanna the model to generate 256*256*16 scenes based on this, while also obtaining the full-resolution Ground Truth of the scene for comparison, you can navigate to configs/infer_s_2_to_s_3.yaml and change some of the parameters as follows:

resume: True
resume_path: '' # Set the path to infer_s_2_to_s_3.tar
generation_num: 100 # Set the number of generations; I suggest setting it to a large number, like 100, to see more diverse scenes.
infer_data_source: 'dataset' # Will read data from the validation set
infer_data_path: './data/CarlaSC_quantized_256_256_16/Cartesian/Val' # Set the path for Val or Test
quantized_infer_data_path: './data/CarlaSC_quantized_64_64_8/Cartesian/Val' # Set the path for Val or Test

Afterward, run the following command:

python launch.py -c configs/infer_s_2_to_s_3.yaml -n conditional_generation

Once the inference is completed, you can find the generated results in the Generated/conditional_generation/GeneratedFusion folder. The condition provided for this stage will be in the PrevSceneContextFusion folder, and the corresponding scene from the validation set will be in the GroundTruthFusion folder.

If you have any further questions, feel free to ask!

fxlong commented 2 months ago

Hi, thank you very much for your feedback. Following your suggestions, I have successfully implemented conditional generation on the carlasc dataset. Could you guide me on how to proceed with conditional generation on the SemanticKITTI dataset? Is it necessary to switch to a pre-trained model specifically tailored for this dataset? And should I be concerned about mapping semantic labels accordingly?

yuhengliu02 commented 2 months ago

Congratulations on completing conditional generation on CarlaSC! At the moment, we haven't released the code for training and inference on the SemanticKITTI dataset. However, we plan to open-source the relevant code and models soon. Thank you for your interest in our work!

fxlong commented 2 months ago

Hello, thank you for your feedback. I'm wondering if conditional generation on the SemanticKITTI dataset is performed using a single scan of point cloud data as the condition, or if it follows the approach used in CarlaSC, where dense points are segmented?

fxlong commented 2 months ago

hi, I have another question,the model does not have the ability to generate something from scratch, it only refines the coarse voxels, and does not predict semantic places that do not exist in the voxel space. Is this my understanding?

yuhengliu02 commented 1 month ago

Sorry for my late reply...

Hello, thank you for your feedback. I'm wondering if conditional generation on the SemanticKITTI dataset is performed using a single scan of point cloud data as the condition, or if it follows the approach used in CarlaSC, where dense points are segmented?

We follow the approach in CarlaSC

hi, I have another question,the model does not have the ability to generate something from scratch, it only refines the coarse voxels, and does not predict semantic places that do not exist in the voxel space. Is this my understanding?

There might be a slight misunderstanding regarding our approach. In the first stage (s1), our model can generate random scenes from scratch, while the models in the later stages focus on refining the details of these randomly generated scenes. Additionally, our model is capable of predicting the semantic position that each voxel should occupy in the scene.