Open fedeotto opened 6 months ago
I think I get your concerns, for this, we followed the OpenCatalyst project where for simple models, the data is not calculated on the fly, if you check the model file for e.g. SchNet and CGCNN, we didn't prompt it to check if we need to calculate the data on the fly. Indeed, we have default setting in the AtomsToPeriodicGraphs
function (radius 6, max neigh 50).
Thanks for the answer. To summarize, what I have noticed is that data.neighbors
won't be created during data processing in AtomsToPeriodicGraphs
. This will always lead otf_graph = True
in generate_graph
method:
def generate_graph(
self,
data,
cutoff=None,
max_neighbors=None,
use_pbc=None,
otf_graph=None,
):
cutoff = cutoff or self.cutoff
max_neighbors = max_neighbors or self.max_neighbors
use_pbc = use_pbc or self.use_pbc
otf_graph = otf_graph or self.otf_graph
if not otf_graph:
try:
edge_index = data.edge_index
if use_pbc:
cell_offsets = data.cell_offsets
neighbors = data.neighbors
empty_image = neighbors == 0
except AttributeError:
logging.warning(
"Turning otf_graph=True as required attributes not present in data object"
)
otf_graph = True
Thanks for the summary! Now I understand your question! I double checked it and found you are right, neighbors are not created in the function AtomsToPeriodicGraphs
. Interesting, we followed how OpenCatalystProject implemented this, it seems they didn't do so as well. Have you played with the OCP code? I feel one simple solution is to add the neighbors field in the data when created with AtomsToPeriodicGraphs
. Have you fixed it?
@yuanqidu thanks for your reply. As you mentioned, I think it should be enough adding
data.neighbors = torch.tensor([data.edge_index.shape[1]], dtype=torch.long)
attribute when creating a pyg Data
object in the convert()
method of AtomsToPeriodicGraphs
.
This was indeed a way to do it that I had found in the CDVAE repository, where data.neighbors
is named data.num_bonds
https://github.com/txie-93/cdvae/blob/f857f598d6f6cca5dc1ea0582d228f12dcc2c2ea/cdvae/pl_data/dataset.py#L66.
Hi, I'm not sure if I'm doing anything wrong but I have noticed that
otf_graph
gets always set toTrue
when I'm trying to train a simplecgcnn
model. I believe this is caused bydata.neighbors
not being created at preprocessing stage (usingAtomsToPeriodicGraphs
), so, does this mean that all initial attributes in the latter (likemax_neigh
) are systematically ignored? Also, I'm trying to understand the difference betweenget_pbc_distances
reported in this repository (which indeed usesdata.neighbors
, missing at data preprocessing stage) against the original one seen in the CDVAE implementation, that utilizesnum_bonds
attribute instead (see below, I just adapted it a little bit to fit the context):