hoffmannjordan / Encoding-Decoding-3D-Crystals

"Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures", Jordan Hoffmann, Louis Maestrati, Yoshihide Sawada, Jian Tang, Jean Michel Sellier, Yoshua Bengio
https://arxiv.org/abs/1909.00949
MIT License
34 stars 8 forks source link

How to handle sparse species matrix? #2

Open fbjfbj opened 3 years ago

fbjfbj commented 3 years ago

Hi Dr. Hoffmann,

After reading your paper, I assume that the species matrix is going to sparse (mostly 0, which means no atom; 30x30x30 grid, with mostly ~10^2 atoms). I'm wondering how you handled this sparse segmentation problem? Because I did not see things like weighted BCE loss in your code. I'm wondering if this imbalanced segmentation for species matrix is able to be handled automatically by the Attention UNet? Thank you very much.

hoffmannjordan commented 3 years ago

What is learned by the VAE is a weighted density matrix. This is used by the 3D unit, which does have a BCE loss here. You are certainly right that a more clever loss function may help the segmentation. :)

The sparse segmentation is setup so that a few voxels around each nucleus are labeled as that nucleus, which helps reduce the sparse issue but certainly does not fully mitigate it.

fbjfbj commented 3 years ago

@hoffmannjordan Thank you very much for your reply.

Yes, the weighted density matrix can definitely help. But if look at the Generate.py, it guess it can be computationally slow to generate those density matrices (30 x 30 x 30 x natoms for each matrix)? I'm wondering if this is the case in your actual implementation?

In addition, I'm wondering if you have considered using the species matrix (faster to generate than density matrix as I assume) as the input and output of the CrystalVAE (with classification loss and KL loss as the loss function), so that we can generate crystal structures directly without a second Unet? Is this going to cause any problem.

Sorry about all the questions and thank you very much.