microsoft / Graphormer

Graphormer is a general-purpose deep learning backbone for molecular modeling.
MIT License
2k stars 324 forks source link

Own dataset problem while dataset.map() #169

Closed Bai1026 closed 11 months ago

Bai1026 commented 11 months ago

This is about using the own dataset to train a graph classifier. My dataset: https://huggingface.co/datasets/VincentPai/for-graphormer-v2
I want to input with the format as triplet by triplet (source - relation - destination) So I have 4.9M row with 3 nodes. And I was thinking about this could that the Graphormer know the whole graph. And the edge_index is like: [[3324, 5], [5, 6699]] means from source with label3324 to destination with label6699 through the relation 5. Could graphormer get the info. of the whole graph of my dataset?

But it turns out that I can't input the format like the dataset I uploaded while I use the function:

from transformers.models.graphormer.collating_graphormer import preprocess_item, GraphormerDataCollator
dataset_processed = dataset.map(preprocess_item, batched=False)

I would like to ask that if there any chance could make the concept happened. I am trying to do the Mitre ATT&CK technique prediction. As a result, I think I really need Graphormer to help me.