waltergerych / advanced_gen_modeling_mqp_2022_2023

1 stars 1 forks source link

Code refactoring - questions for Dillon #1

Closed sirutBuasai closed 1 year ago

sirutBuasai commented 1 year ago

We're not using variables num_divs and show_heatmap here, safe to remove them? https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L154


Same here, we're not using features, plot, num_divs, and show_heatmap. Do you think it's safe to remove those variables? I intuitively think we can remove plot, num_divs, and show_heatmap but the features variable is a bit iffy to me. What did we use features for and why do we not use them anymore? https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L242


Same here, safe to remove plot, num_divs, and show_heatmap? https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L315


These variables are unused across multiple functions in this file. Any reason we declare them or are they safe to remove? https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L172-L174 https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L263-L267 https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L336-L338


This utils function is used in both the reverse_diffusion function and the reverse_tabular_diffusion function. Is the reverse_diffusion function still being used and should it be kept? If so, I will separate utils.noise_estimation_loss into utils.diffusion_noise_estimation and utils.tabular_diffusion_noise_estimation. If not, can we remove the reverse_diffusion function altogether? https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L202


reverse_categorical_diffusion seems abandoned because we are now using tabular diffusion with both discrete and continuous features. Should we try to fix this function back to its functional state or should we just remove it? Might be a good idea to fix it back to a functional state just in case but if you think we can remove it I'll remove it in the next commit. Also, if we plan to remove all the purely categorical stuff, we should probably go through the utils.py file to remove the utility function for that too which is why I think restoring reverse_categorical_diffusion back to a functional state might be easier.

Errors in reverse_categorical_diffusion: The input_size argument is missing for the ConditionalMultinomialModel class. https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L274

The x_0_continuous argument is missing for the utils.multinomial_diffusion_noise_estimation function. https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L294

The continuous argument is missing for the utils.get_discrete_model_output function. https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L306

dmccarthy11 commented 1 year ago

We're not using variables num_divs and show_heatmap here, safe to remove them?

https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L154

Yes

Same here, we're not using features, plot, num_divs, and show_heatmap. Do you think it's safe to remove those variables? I intuitively think we can remove plot, num_divs, and show_heatmap but the features variable is a bit iffy to me. What did we use features for and why do we not use them anymore?

https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L242

Yes all four are safe. Originally "features" was going to be used because the parser that comes with the ExtraSensory dataset returns the data and a list of features as strings, and I had planned on using that list.

Same here, safe to remove plot, num_divs, and show_heatmap?

https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L315

Yes

These variables are unused across multiple functions in this file. Any reason we declare them or are they safe to remove?

https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L172-L174

Nope, not really any reason, that was part of refactoring the original code. Safe to remove

https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L263-L267

https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L336-L338

This utils function is used in both the reverse_diffusion function and the reverse_tabular_diffusion function. Is the reverse_diffusion function still being used and should it be kept? If so, I will separate utils.noise_estimation_loss into utils.diffusion_noise_estimation and utils.tabular_diffusion_noise_estimation. If not, can we remove the reverse_diffusion function altogether?

https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L202

reverse_categorical_diffusion seems abandoned because we are now using tabular diffusion with both discrete and continuous features. Should we try to fix this function back to its functional state or should we just remove it? Might be a good idea to fix it back to a functional state just in case but if you think we can remove it I'll remove it in the next commit. Also, if we plan to remove all the purely categorical stuff, we should probably go through the utils.py file to remove the utility function for that too which is why I think restoring reverse_categorical_diffusion back to a functional state might be easier.

Errors in reverse_categorical_diffusion: The input_size argument is missing for the ConditionalMultinomialModel class.

https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L274

The x_0_continuous argument is missing for the utils.multinomial_diffusion_noise_estimation function.

https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L294

The continuous argument is missing for the utils.get_discrete_model_output function.

https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/a6eafa7087f389c52ff73232954e987a6f248801/diffusion_models/diffusion.py#L306

These last two I think can be answered the same way. This may require some testing and is up to some of your choice, but originally I thought that transitioning to tabular meant modifying the code and model, and that would require the diffusion to always be tabular. Given what Walter said, however, that we can simply comment out the loss of continuous and use only discrete data for a full categorical diffusion, and comment out discrete and use only continuous loss for continuous diffusion, I believe we can keep just the reverse_tabular_diffusion.

My thinking was to keep the prior code to have the working and have the option of using just this, but noting this change it might be fine just to keep the tabular. This would require some minor commenting out, just changing

https://github.com/waltergerych/advanced_gen_modeling_mqp_2022_2023/blob/7741e94ecfee1ba51958d092d396c642f0f2dcd1/diffusion_models/diffusion.py#L371-L373

to

# multinomial_loss = utils.multinomial_diffusion_noise_estimation(model, batch_x_discrete, batch_x_continuous, diffusion, k, feature_indices) continuous_loss = utils.noise_estimation_loss(model, batch_x_continuous, batch_x_discrete, feature_indices, k, alphas_bar_sqrt, one_minus_alphas_bar_sqrt, num_steps) loss = continuous_loss

So I think the options are:

sirutBuasai commented 1 year ago

Replies noted. Changes will be pushed in the upcoming commit.