huggingface / transformers

๐Ÿค— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.74k stars 26.23k forks source link

TokenGT #21079

Open clefourrier opened 1 year ago

clefourrier commented 1 year ago

Model description

Adding the TokenGT graph transformer model with @Raman-Kumar (see Graphormer issue)

@Raman-Kumar I'll create a PR with what I had ported of TokenGT at the end of the week, to give you a starting point! You'll need to read this first, to get an idea of the steps we follow when integrating a model. Then, 1st step will be checking the code with a checkpoint, so you need to look for one and download it, to compare results with the original implementation. Does that work for you?

Open source status

Provide useful links for the implementation

No response

Raman-Kumar commented 1 year ago

@clefourrier for sure will work

Raman-Kumar commented 1 year ago

Thanks for assigning. @clefourrier ๐Ÿ˜Š I am still examining and experimenting more...

clefourrier commented 1 year ago

Ping me if you need help! :smile:

Raman-Kumar commented 1 year ago

๐Ÿ˜ข giving up fingering out myself my level - I was not familiar with transformer architecture, collator etc, and other models like bert now I have studied them, and the TokenGT modelโ€™s theoretical aspects.

I have downloaded the checkpoint folder form drive link from the original repo link

Now I have to run both PR with checkpoint and original repo

Can you share the script you did with Graphormer? @clefourrier

clefourrier commented 1 year ago

Ok so you will need to do something similar to this:

import argparse
import os, sys
from pathlib import Path

import torch
from torch import nn
from torch.hub import load_state_dict_from_url

# Here, you need to import the transformers version of the TokenGT code (from the PR) 
from transformers import (
    AutoModel,
    GraphormerConfig,
    GraphormerForGraphClassification,
    GraphormerModel,
    # GraphormerCollator
)
from transformers.utils import logging
from transformers.models.graphormer.collating_graphormer import preprocess_item, GraphormerDataCollator 

# Here, you need to import the original TokenGT code instead of Graphormer
sys.path.append("path to Graphormer/")
import graphormer
import graphormer.tasks.graph_prediction
import graphormer.models.graphormer
from graphormer.evaluate.evaluate import convert_namespace_to_omegaconf, tasks, options
from fairseq import utils
from fairseq.logging import progress_bar

# You will likely have to change some of these depending on the error messages you get when loading the checkpoint to transformers format
rename_keys = [
    ("encoder.lm_output_learned_bias", "classifier.lm_output_learned_bias"),
    ("encoder.embed_out.weight", "classifier.classifier.weight"),
    #("encoder.embed_out.weight", "classifier.embed_out.weight"),
    #("encoder.embed_out.bias", "classifier.embed_out.bias"),
]

def remove_ignore_keys_(state_dict):
    ignore_keys = [
        "encoder.version",
        "decoder.version",
        "encoder.masked_lm_pooler.bias",  # to check
        "encoder.masked_lm_pooler.weight",  # to check
        "_float_tensor",
    ]
    for k in ignore_keys:
        state_dict.pop(k, None)

def rename_key(dct, old, new):
    val = dct.pop(old)
    dct[new] = val

def make_linear_from_emb(emb):
    vocab_size, emb_size = emb.weight.shape
    lin_layer = nn.Linear(vocab_size, emb_size, bias=False)
    lin_layer.weight.data = emb.weight.data
    return lin_layer

# In this section, you need to replace calls to Graphormer by calls to TokenGT models. 
# Graphormer model gets replaced by the original TokenGT model
# Transformers model gets replaced by the format in Transformers 
@torch.no_grad()
def convert_graphormer_checkpoint(
    args, checkpoint_name, pytorch_dump_folder_path
):
    pytorch_dump_folder_path = f"{pytorch_dump_folder_path}/{checkpoint_name}" 
    cfg = convert_namespace_to_omegaconf(args)
    task = tasks.setup_task(cfg.task)

    # Graphormer model
    graphormer_model = task.build_model(cfg.model)
    graphormer_state =   torch.load(checkpoint_name)["model"]
    graphormer_model.load_state_dict(graphormer_state, strict=True, model_cfg=cfg.model)
    graphormer_model.upgrade_state_dict(graphormer_model.state_dict())

    # Transformers model
    config = GraphormerConfig(
        num_labels=1,
        share_input_output_embed=False,
        num_layers=12,
        embedding_dim=768,
        ffn_embedding_dim=768,
        num_attention_heads=32,
        dropout=0.0,
        attention_dropout=0.1,
        activation_dropout=0.1,
        encoder_normalize_before=True,
        pre_layernorm=False,
        apply_graphormer_init=True,
        activation_fn="gelu",
        no_token_positional_embeddings=False,
    )
    transformers_model = GraphormerForGraphClassification(config)

    # We copy the state dictionary from the original model to our format 
    state_dict = graphormer_model.state_dict()
    remove_ignore_keys_(state_dict)
    for src, dest in rename_keys:
        rename_key(state_dict, src, dest)
    transformers_model.load_state_dict(state_dict)

    # Check results
    graphormer_model.eval()
    transformers_model.eval()

    split = args.split
    task.load_dataset(split)
    batch_iterator = task.get_batch_iterator(
        dataset=task.dataset(split),
        max_tokens=cfg.dataset.max_tokens_valid,
        max_sentences=2, #cfg.dataset.batch_size_valid,
        max_positions=utils.resolve_max_positions(
            task.max_positions(),
            graphormer_model.max_positions(),
        ),
        ignore_invalid_inputs=cfg.dataset.skip_invalid_size_inputs_valid_test,
        required_batch_size_multiple=cfg.dataset.required_batch_size_multiple,
        seed=cfg.common.seed,
        num_workers=cfg.dataset.num_workers,
        epoch=0,
        data_buffer_size=cfg.dataset.data_buffer_size,
        disable_iterator_cache=False,
    )
    itr = batch_iterator.next_epoch_itr(
        shuffle=False, set_dataset_epoch=False
    )
    progress = progress_bar.progress_bar(
        itr,
        log_format=cfg.common.log_format,
        log_interval=cfg.common.log_interval,
        default_log_format=("tqdm" if not cfg.common.no_progress_bar else "simple")
    )

    # Inference
    collator = GraphormerDataCollator() #on_the_fly_processing=True)
    ys_graphormer = []
    ys_transformers = []
    with torch.no_grad():
        for i, sample in enumerate(progress):
            y_graphormer = graphormer_model(**sample["net_input"])[:, 0, :].reshape(-1)
            ys_graphormer.extend(y_graphormer.detach())
            #print(sample["net_input"]["batched_data"])
            transformer_sample = sample["net_input"]["batched_data"] # data is already collated - collator(sample["net_input"]["batched_data"])
            transformer_sample.pop("idx")
            transformer_sample["labels"] = transformer_sample.pop("y")
            transformer_sample["node_input"] = transformer_sample.pop("x")
            torch.set_printoptions(profile="full")
            y_transformer = transformers_model(**transformer_sample)["logits"] #[:, 0, :].reshape(-1)
            ys_transformers.extend(y_transformer.detach())

    ys_graphormer = torch.stack(ys_graphormer)
    ys_transformers = torch.stack(ys_transformers).squeeze(-1)

    assert ys_graphormer.shape == ys_transformers.shape
    assert (ys_graphormer == ys_transformers).all().item()

    print("All good :)")

    Path(pytorch_dump_folder_path).mkdir(exist_ok=True)
    transformers_model.save_pretrained(pytorch_dump_folder_path)
    transformers_model.push_to_hub(checkpoint_name, use_auth_token="replace by your token")

if __name__ == "__main__":
    parser = options.get_training_parser()
    # Required parameters
    parser.add_argument(
        "--checkpoint_name",
        type=str,
        help="name of a model to load",  # path to a model.pt on local filesystem."
    )
    parser.add_argument(
        "--pytorch_dump_folder_path",
        default=None,
        type=str,
        help="Path to the output PyTorch model.",
    )

    parser.add_argument(
        "--split",
        type=str,
    )
    parser.add_argument(
        "--metric",
        type=str,
    )

    args = options.parse_args_and_arch(parser, modify_parser=None)
    print(args)

    #args = parser.parse_args()
    convert_graphormer_checkpoint(
        args,
        args.checkpoint_name,
        args.pytorch_dump_folder_path,
    )
Raman-Kumar commented 1 year ago

new to deep learning I am using macbook air m1 While running command pip install -e ".[dev]" for transformers repo, It shows some error for tensorflow So, I am using pip install -e ".[dev-torch]", which works fine.

what argument list do you supply when running the above script for Graphormer? @clefourrier

clefourrier commented 1 year ago

Hi @Raman-Kumar! I don't think the tensorflow error is very important atm, don't worry :smile:

Here is my argument list: --checkpoint_name Name_of_the_checkpoint_you_downloaded_for_tokenGT --pytorch_dump_folder_path tmp --user-dir "Directory where you cloned the code from the TokenGT repository" --num-workers 16 --ddp-backend=legacy_ddp --dataset-name MUTAG_0 --user-data-dir "custom_datasets" --task graph_prediction --criterion l1_loss --arch graphormer_base --num-classes 1 --batch-size 64 --pretrained-model-name pcqm4mv1_graphormer_base --load-pretrained-model-output-layer --split valid --seed 1

From ddp-backend on, you will need to adapt the parameters to launch one of the available datasets in TokenGT, or you could add a custom_datasets loader in tokengt/data/predict_custom.

For the latter, I think there is a sample script, but if not you can take inspiration from this, which loads MUTAG from the hub to load it in TokenGT:

 from datasets import load_dataset

from tokengt.data import register_dataset
from tokengt.data.pyg_datasets.pyg_dataset import TokenGTPYGDataset

import torch
from torch_geometric.data import Data, Dataset, InMemoryDataset

import numpy as np

class TmpDataset(InMemoryDataset):
    def __init__(self, root, data_list):
        self.data_list = data_list
        super().__init__(root, None, None, None)

    @property
    def raw_file_names(self):
        return []

    @property
    def processed_file_names(self):
        return ["data.pt"]

    def len(self):
        return len(self.data_list)

    def get(self, idx):
        data = self.data_list[idx]
        return data

def create_customized_dataset(dataset_name, ix_xval):
    graphs_dataset = load_dataset(f"graphs-datasets/{dataset_name}")
    graphs_dataset = graphs_dataset.shuffle(0)

    key = "full" if "full" in graphs_dataset.keys() else "train"

    graphs_list = [
        Data(
            **{
                "edge_index": torch.tensor(graph["edge_index"], dtype=torch.long),
                "y": torch.tensor(graph["y"], dtype=torch.long),
                "num_nodes": graph["num_nodes"],
                #"x": torch.ones(graph["num_nodes"], 1, dtype=torch.long), # same embedding for all
                #"edge_attr": torch.ones(len(graph["edge_index"][0]), 1, dtype=torch.long), # same embedding for all
                "x": torch.tensor(graph["node_feat"], dtype=torch.long) if "node_feat" in graph.keys() else torch.ones(graph["num_nodes"], 1, dtype=torch.long), # same embedding for all
                "edge_attr": torch.tensor(graph["edge_attr"], dtype=torch.long) if "edge_attr" in graph.keys() else torch.ones(len(graph["edge_index"][0]), 1, dtype=torch.long), # same embedding for all
            }
        )
        for graph in graphs_dataset[key]
    ]

    len_dataset = len(graphs_dataset[key])
    len_xval_batch = int(len_dataset / 10)
    cur_val_range_int = list(range(ix_xval * len_xval_batch, (ix_xval + 1) * len_xval_batch))
    cur_val_range = np.array(cur_val_range_int, dtype=np.int64)
    cur_train_range = np.array(
        [ix for ix in range(len_dataset) if ix not in cur_val_range_int], dtype=np.int64
    )

    dataset = TmpDataset("", graphs_list)

    return {
        "dataset": TokenGTPYGDataset(
            dataset=dataset,
            seed=0,
            train_idx=torch.tensor([0]), #cur_train_range),
            valid_idx=torch.tensor(cur_val_range),
            test_idx=torch.tensor(cur_val_range),
        ), 
        "source": "pyg",
        "train_idx":torch.tensor(cur_train_range),
        "valid_idx":torch.tensor(cur_val_range),
        "test_idx":torch.tensor(cur_val_range),
        }

  @register_dataset("MUTAG_0")
  def m0():
      return create_customized_dataset("MUTAG", 0)

Tell me if anything is unclear! :hugs:

Raman-Kumar commented 1 year ago

Right now I am running this script

script.py

import argparse
import os, sys
from pathlib import Path

import torch
from torch import nn
from torch.hub import load_state_dict_from_url
from transformers.utils import logging

import tokengt
import tokengt.tasks.graph_prediction 
import tokengt.models.tokengt
from tokengt.evaluate.evaluate import convert_namespace_to_omegaconf, tasks, options

from fairseq import utils
from fairseq.logging import progress_bar

@torch.no_grad()
def convert_tokengt_checkpoint(
    args, checkpoint_name, pytorch_dump_folder_path
    ):
    pytorch_dump_folder_path = f"{pytorch_dump_folder_path}/{checkpoint_name}" 
    cfg = convert_namespace_to_omegaconf(args)
    # task = tasks.setup_task(cfg.task)

if __name__ == "__main__":
    parser = options.get_training_parser()
    # Required parameters
    parser.add_argument(
        "--checkpoint_name",
        type=str,
        help="name of a model to load",  # path to a model.pt on local filesystem."
    )
    parser.add_argument(
        "--pytorch_dump_folder_path",
        default=None,
        type=str,
        help="Path to the output PyTorch model.",
    )

    parser.add_argument(
        "--split",
        type=str,
    )
    parser.add_argument(
        "--metric",
        type=str,
    )

    args = options.parse_args_and_arch(parser, modify_parser=None)
    print(args.pytorch_dump_folder_path)

    args = parser.parse_args()
    convert_tokengt_checkpoint(
        args,
        args.checkpoint_name,
        args.pytorch_dump_folder_path,
    )

with command .....script.py --checkpoint_name pcqv2-tokengt-orf64-trained --pytorch_dump_folder_path tmp --user-dir "../tokengt" --num-workers 16 --ddp-backend=legacy_ddp --dataset-name PCQM4Mv2 --user-data-dir "tokengt/data/ogb_datasets" --task graph_prediction --criterion l1_loss --arch tokengt_base --num-classes 1 --batch-size 64 --pretrained-model-name mytokengt --load-pretrained-model-output-layer --split valid --seed 1

in cfg = convert_namespace_to_omegaconf(args)

I am getting this error

2023-02-09 13:05:21 | ERROR | fairseq.dataclass.utils | Error when composing. Overrides: ['common.no_progress_bar=False', 'common.log_interval=100', 'common.log_format=null', 'common.log_file=null', 'common.aim_repo=null', 'common.aim_run_hash=null', 'common.tensorboard_logdir=null', 'common.wandb_project=null', 'common.azureml_logging=False', 'common.seed=1', 'common.cpu=False', 'common.tpu=False', 'common.bf16=False', 'common.memory_efficient_bf16=False', 'common.fp16=False', 'common.memory_efficient_fp16=False', 'common.fp16_no_flatten_grads=False', 'common.fp16_init_scale=128', 'common.fp16_scale_window=null', 'common.fp16_scale_tolerance=0.0', 'common.on_cpu_convert_precision=False', 'common.min_loss_scale=0.0001', 'common.threshold_loss_scale=null', 'common.amp=False', 'common.amp_batch_retries=2', 'common.amp_init_scale=128', 'common.amp_scale_window=null', "common.user_dir='../tokengt'", 'common.empty_cache_freq=0', 'common.all_gather_list_size=16384', 'common.model_parallel_size=1', 'common.quantization_config_path=null', 'common.profile=False', 'common.reset_logging=False', 'common.suppress_crashes=False', 'common.use_plasma_view=False', "common.plasma_path='/tmp/plasma'", 'common_eval.path=null', 'common_eval.post_process=null', 'common_eval.quiet=False', "common_eval.model_overrides='{}'", 'common_eval.results_path=null', 'distributed_training.distributed_world_size=1', 'distributed_training.distributed_num_procs=1', 'distributed_training.distributed_rank=0', "distributed_training.distributed_backend='nccl'", 'distributed_training.distributed_init_method=null', 'distributed_training.distributed_port=-1', 'distributed_training.device_id=0', 'distributed_training.distributed_no_spawn=False', "distributed_training.ddp_backend='legacy_ddp'", "distributed_training.ddp_comm_hook='none'", 'distributed_training.bucket_cap_mb=25', 'distributed_training.fix_batches_to_gpus=False', 'distributed_training.find_unused_parameters=False', 'distributed_training.gradient_as_bucket_view=False', 'distributed_training.fast_stat_sync=False', 'distributed_training.heartbeat_timeout=-1', 'distributed_training.broadcast_buffers=False', 'distributed_training.slowmo_momentum=null', "distributed_training.slowmo_base_algorithm='localsgd'", 'distributed_training.localsgd_frequency=3', 'distributed_training.nprocs_per_node=1', 'distributed_training.pipeline_model_parallel=False', 'distributed_training.pipeline_balance=null', 'distributed_training.pipeline_devices=null', 'distributed_training.pipeline_chunks=0', 'distributed_training.pipeline_encoder_balance=null', 'distributed_training.pipeline_encoder_devices=null', 'distributed_training.pipeline_decoder_balance=null', 'distributed_training.pipeline_decoder_devices=null', "distributed_training.pipeline_checkpoint='never'", "distributed_training.zero_sharding='none'", 'distributed_training.fp16=False', 'distributed_training.memory_efficient_fp16=False', 'distributed_training.tpu=False', 'distributed_training.no_reshard_after_forward=False', 'distributed_training.fp32_reduce_scatter=False', 'distributed_training.cpu_offload=False', 'distributed_training.use_sharded_state=False', 'distributed_training.not_fsdp_flatten_parameters=False', 'dataset.num_workers=16', 'dataset.skip_invalid_size_inputs_valid_test=False', 'dataset.max_tokens=null', 'dataset.batch_size=64', 'dataset.required_batch_size_multiple=8', 'dataset.required_seq_len_multiple=1', 'dataset.dataset_impl=null', 'dataset.data_buffer_size=10', "dataset.train_subset='train'", "dataset.valid_subset='valid'", 'dataset.combine_valid_subsets=null', 'dataset.ignore_unused_valid_subsets=False', 'dataset.validate_interval=1', 'dataset.validate_interval_updates=0', 'dataset.validate_after_updates=0', 'dataset.fixed_validation_seed=null', 'dataset.disable_validation=False', 'dataset.max_tokens_valid=null', 'dataset.batch_size_valid=null', 'dataset.max_valid_steps=null', 'dataset.curriculum=0', "dataset.gen_subset='test'", 'dataset.num_shards=1', 'dataset.shard_id=0', 'dataset.grouped_shuffling=False', 'dataset.update_epoch_batch_itr=null', 'dataset.update_ordered_indices_seed=False', 'optimization.max_epoch=0', 'optimization.max_update=0', 'optimization.stop_time_hours=0.0', 'optimization.clip_norm=0.0', 'optimization.sentence_avg=False', 'optimization.update_freq=[1]', 'optimization.lr=[0.25]', 'optimization.stop_min_lr=-1.0', 'optimization.use_bmuf=False', 'optimization.skip_remainder_batch=False', "checkpoint.save_dir='checkpoints'", "checkpoint.restore_file='checkpoint_last.pt'", 'checkpoint.continue_once=null', 'checkpoint.finetune_from_model=null', 'checkpoint.reset_dataloader=False', 'checkpoint.reset_lr_scheduler=False', 'checkpoint.reset_meters=False', 'checkpoint.reset_optimizer=False', "checkpoint.optimizer_overrides='{}'", 'checkpoint.save_interval=1', 'checkpoint.save_interval_updates=0', 'checkpoint.keep_interval_updates=-1', 'checkpoint.keep_interval_updates_pattern=-1', 'checkpoint.keep_last_epochs=-1', 'checkpoint.keep_best_checkpoints=-1', 'checkpoint.no_save=False', 'checkpoint.no_epoch_checkpoints=False', 'checkpoint.no_last_checkpoints=False', 'checkpoint.no_save_optimizer_state=False', "checkpoint.best_checkpoint_metric='loss'", 'checkpoint.maximize_best_checkpoint_metric=False', 'checkpoint.patience=-1', "checkpoint.checkpoint_suffix=''", 'checkpoint.checkpoint_shard_count=1', 'checkpoint.load_checkpoint_on_all_dp_ranks=False', 'checkpoint.write_checkpoints_asynchronously=False', 'checkpoint.model_parallel_size=1', 'bmuf.block_lr=1.0', 'bmuf.block_momentum=0.875', 'bmuf.global_sync_iter=50', 'bmuf.warmup_iterations=500', 'bmuf.use_nbm=False', 'bmuf.average_sync=False', 'bmuf.distributed_world_size=1', 'generation.beam=5', 'generation.nbest=1', 'generation.max_len_a=0.0', 'generation.max_len_b=200', 'generation.min_len=1', 'generation.match_source_len=False', 'generation.unnormalized=False', 'generation.no_early_stop=False', 'generation.no_beamable_mm=False', 'generation.lenpen=1.0', 'generation.unkpen=0.0', 'generation.replace_unk=null', 'generation.sacrebleu=False', 'generation.score_reference=False', 'generation.prefix_size=0', 'generation.no_repeat_ngram_size=0', 'generation.sampling=False', 'generation.sampling_topk=-1', 'generation.sampling_topp=-1.0', 'generation.constraints=null', 'generation.temperature=1.0', 'generation.diverse_beam_groups=-1', 'generation.diverse_beam_strength=0.5', 'generation.diversity_rate=-1.0', 'generation.print_alignment=null', 'generation.print_step=False', 'generation.lm_path=null', 'generation.lm_weight=0.0', 'generation.iter_decode_eos_penalty=0.0', 'generation.iter_decode_max_iter=10', 'generation.iter_decode_force_max_iter=False', 'generation.iter_decode_with_beam=1', 'generation.iter_decode_with_external_reranker=False', 'generation.retain_iter_history=False', 'generation.retain_dropout=False', 'generation.retain_dropout_modules=null', 'generation.decoding_format=null', 'generation.no_seed_provided=False', 'generation.eos_token=null', 'eval_lm.output_word_probs=False', 'eval_lm.output_word_stats=False', 'eval_lm.context_window=0', 'eval_lm.softmax_batch=9223372036854775807', 'interactive.buffer_size=0', "interactive.input='-'", 'ema.store_ema=False', 'ema.ema_decay=0.9999', 'ema.ema_start_update=0', 'ema.ema_seed_model=null', 'ema.ema_update_freq=1', 'ema.ema_fp32=False', 'task=graph_prediction', 'task._name=graph_prediction', "task.dataset_name='PCQM4Mv2'", 'task.num_classes=1', 'task.max_nodes=128', "task.dataset_source='pyg'", 'task.num_atoms=4608', 'task.num_edges=1536', 'task.num_in_degree=512', 'task.num_out_degree=512', 'task.num_spatial=512', 'task.num_edge_dis=128', 'task.multi_hop_max_dist=5', 'task.spatial_pos_max=1024', "task.edge_type='multi_hop'", 'task.seed=1', "task.pretrained_model_name='mytokengt'", 'task.load_pretrained_model_output_layer=True', 'task.train_epoch_shuffle=True', "task.user_data_dir='tokengt/data/ogb_datasets'", 'criterion=l1_loss', 'criterion._name=l1_loss', 'lr_scheduler=fixed', 'lr_scheduler._name=fixed', 'lr_scheduler.force_anneal=null', 'lr_scheduler.lr_shrink=0.1', 'lr_scheduler.warmup_updates=0', 'lr_scheduler.lr=[0.25]', 'scoring=bleu', 'scoring._name=bleu', 'scoring.pad=1', 'scoring.eos=2', 'scoring.unk=3']
Traceback (most recent call last):
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 513, in _apply_overrides_to_config
    OmegaConf.update(cfg, key, value, merge=True)
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/omegaconf/omegaconf.py", line 613, in update
    root.__setattr__(last_key, value)
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 285, in __setattr__
    raise e
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 282, in __setattr__
    self.__set_impl(key, value)
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 266, in __set_impl
    self._set_item_impl(key, value)
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/omegaconf/basecontainer.py", line 398, in _set_item_impl
    self._validate_set(key, value)
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 143, in _validate_set
    self._validate_set_merge_impl(key, value, is_assign=True)
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/omegaconf/dictconfig.py", line 156, in _validate_set_merge_impl
    self._format_and_raise(
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/omegaconf/base.py", line 95, in _format_and_raise
    format_and_raise(
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/omegaconf/_utils.py", line 694, in format_and_raise
    _raise(ex, cause)
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/omegaconf/_utils.py", line 610, in _raise
    raise ex  # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ValidationError: child 'dataset.update_epoch_batch_itr' is not Optional
        full_key: dataset.update_epoch_batch_itr
        reference_type=DatasetConfig
        object_type=DatasetConfig

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ramankumar/OpenSource/script.py", line 106, in <module>
    convert_graphormer_checkpoint(
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/ramankumar/OpenSource/script.py", line 74, in convert_graphormer_checkpoint
    cfg = convert_namespace_to_omegaconf(args)
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/fairseq/dataclass/utils.py", line 399, in convert_namespace_to_omegaconf
    composed_cfg = compose("config", overrides=overrides, strict=False)
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/hydra/experimental/compose.py", line 31, in compose
    cfg = gh.hydra.compose_config(
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 507, in compose_config
    cfg = self.config_loader.load_configuration(
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 151, in load_configuration
    return self._load_configuration(
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 277, in _load_configuration
    ConfigLoaderImpl._apply_overrides_to_config(config_overrides, cfg)
  File "/Users/ramankumar/OpenSource/transformers/.env/lib/python3.9/site-packages/hydra/_internal/config_loader_impl.py", line 520, in _apply_overrides_to_config
    raise ConfigCompositionException(
hydra.errors.ConfigCompositionException: Error merging override dataset.update_epoch_batch_itr=null

child 'dataset.update_epoch_batch_itr' is not Optional ?? @clefourrier

clefourrier commented 1 year ago

I think you read the error correctly, apparently for TokenGT+fairseq it does not seem to be.

You could try passing it as False (I think it's a boolean), or looking for it either in the loading scripts or config files to see how it is managed for the project.

Raman-Kumar commented 1 year ago

Once again explain how to supply datasets in an argument

I created a file predict_custom.py alongside (in same folder) conversion script.py and pasted all code you gave

from datasets import load_dataset

....
class TmpDataset(InMemoryDataset):
....

def create_customized_dataset(dataset_name, ix_xval):
....
 @register_dataset("MUTAG_0")
 def m0():
     return create_customized_dataset("MUTAG", 0)

--dataset-name --MUTAG_0 --user-data-dir "/tokengt/data/ogb_datasets" How I should write here? @clefourrier

clefourrier commented 1 year ago

The simplest would be to do what you did initially, and use one of the native datasets for TokenGT with --dataset-name PCQM4Mv2. If you want to use custom datasets, your --user-data-dir must point to the folder containing your dataset script if I remember well.

Raman-Kumar commented 1 year ago

๐Ÿ™‚ Got familiar with PyTorch geometric and Graph Neural Network Project I read about parameters and datasets for Graph from Graphormer/docs

here at tokengt/large-scale-regression/scripts was training script for tokengt usingfairseq-train with argument list

Initially, I assumed that those argument list only used with fairseq-train but (!No) same applies to conversion script as well (I did not try this.๐Ÿ˜• so sad!! )

Now everything works fine. yay ๐Ÿ˜Š

clefourrier commented 1 year ago

Congratulations, that's very cool! :hugs:

Do you know what your next steps are?

Raman-Kumar commented 1 year ago

Next I added some import-related code in transformers folder like src/transformers/__init__.pyand other files (taking the help of Graphormer PR )

after that I was successfully able to import HF๐Ÿค—tokegt in my conversion script.py

from transformers import (
    AutoModel,
    TokenGTConfig,
    TokenGTForGraphClassification,
)
    tokengt_model = task.build_model(cfg.model)
    tokengt_state =   torch.load(checkpoint_name)["model"]
    tokengt_model.load_state_dict(tokengt_state, strict=True, model_cfg=cfg.model)
    tokengt_model.upgrade_state_dict(tokengt_model.state_dict())
    # upto these lines works fine  no error 

# Transformers model
    config = TokenGTConfig(
        tasks_weights=None, # added this  
        num_labels=1,
        share_input_output_embed=False,
        num_layers=12,
        embedding_dim=768,
        ffn_embedding_dim=768,
        num_attention_heads=32,
        dropout=0.0,
        attention_dropout=0.1,
        activation_dropout=0.1,
        encoder_normalize_before=True,
        pre_layernorm=False,
        apply_graphormer_init=True,
        activation_fn="gelu",
        no_token_positional_embeddings=False,
    )
    transformers_model = TokenGTForGraphClassification(config)
    state_dict = tokengt_model.state_dict()

    transformers_model.load_state_dict(state_dict) # here shows me following error
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for TokenGTForGraphClassification:
        Missing key(s) in state_dict: "decoder.lm_output_learned_bias", "decoder.embed_out.weight". 
        Unexpected key(s) in state_dict: "encoder.lm_output_learned_bias", "encoder.embed_out.weight", "encoder.graph_encoder.final_layer_norm.weight", "encoder.graph_encoder.final_layer_norm.bias", "encoder.graph_encoder.graph_feature.orf_encoder.weight", "encoder.graph_encoder.graph_feature.order_encoder.weight". 
        size mismatch for encoder.graph_encoder.graph_feature.edge_encoder.weight: copying a param with shape torch.Size([1536, 768]) from checkpoint, the shape in current model is torch.Size([2048, 768]).

there are two checkpoints lap16, orf64. Both gives same error except "encoder.graph_encoder.graph_feature.lap_encoder.weight" "encoder.graph_encoder.graph_feature.orf_encoder.weight"

these are error Missing key(s), Unexpected key(s), size mismatch

need help @clefourrier

edit : adding num_edges=1536 in config removed size mismatch error

clefourrier commented 1 year ago

I think this should be managed with the remove_ignore_keys_ and rename_keys parts: you need to find what the "unexpected keys" in the original checkpoint map to in the new format, and rename them accordingly. In essence, you are going from one format (tokenGT format) to another format (transformers style) for your checkpoint, so you need to do this mapping.

Congrats on debugging the other error! :clap:

Raman-Kumar commented 1 year ago

Initially, I had no idea how to map them and to what. I don't even know what they mean. So, I spent some time studying transformers and looking at code.

suddenly I thought let's print models So, I printed both original models and HF๐Ÿค— model

    print(transformers_model)
    print(tokengt_model)

and compared the differences. Accordingly, I added these arguments to the config

# config for lap16
config = TokenGTConfig(
        ...
        lap_node_id=True,
        lap_node_id_k=16,
        id2label = {"1":"className"}, # I added a dictionary explained below why I did this 
        type_id=True,
        prenorm=True,
        ...
)

and renamed keys

rename_keys = [
    ("encoder.embed_out.weight", "decoder.embed_out.weight"),

    # I did not find lm_output_learned_bias in models So, I checked code and doing this made most sense 
    ("encoder.lm_output_learned_bias", "decoder.lm_output_learned_bias"), 
]

Doing this works fine. no error.

if I don't do this id2label = {"1":"className"} putting argument num_labels = 1 in config = TokenGTConfig( has no effect because num_labels gets a default value 2 in PretrainedConfig (see code below) file (super class of TokenGTConfig(PretrainedConfig))

which would give a size mismatch error

https://github.com/huggingface/transformers/blob/9d1116e9951686f937d17697820117636bfc05a5/src/transformers/configuration_utils.py#L326-L330

clefourrier commented 1 year ago

It's really great to see your motivation, good job! :sparkles:

I'll try to check the code to confirm the key renames you made, but I think they do make sense because of the naming changes between the original and new models.

For the id2label, I don't think it is such a good idea to modify things outside of the TokenGT files - normally the parent class (PretrainedConfig) is overwritten by the child class (TokenGTConfig), are you sure this modification is happening here? I think you could also try changing the TokenGTConfig num_labels default value to 1 instead of None and see what happens.

Raman-Kumar commented 1 year ago

Yes, I am sure

clefourrier commented 1 year ago

Hi @Raman-Kumar ! I took some time to clean the code a bit and edited some parts, it should be better now for the problems you mentioned. If problems occur in the future, fyi the Graphormer code which was integrated in the lib is quite similar to this one, so you can look at how they are managed there.

Because of a mixup on my github I had to create a new PR for this https://github.com/huggingface/transformers/pull/21745 and this is where you'll find the new code. Hoping it helps you! :hugs:

Raman-Kumar commented 1 year ago

Hi, @clefourrier I had already figured it out but I was very sick for few days ๐Ÿ˜”.

In the previous PR, I did three changes after that it printed "All good :)"

  1. changing num_labels to num_classes (after that no need to add id2label which you suggested not to add)
  2. In File models/tokengt/configuration_tokengt.py, import torch.nn.functional as F is missing
  3. decode name was wrongly written in TokenGTForGraphClassification class in forward function

I was just going to upload the newly created config.json and pytorch_model.bin file to hugging face id.

Now I will look at new PR and will send changes with Tests and Docs to new PR.

clefourrier commented 1 year ago

That sounds good, these changes sound similar to the ones in the new PR.

I hope you take rest and get better soon :hugs:

Raman-Kumar commented 1 year ago

Hi, back again Uploaded converted checkpoint and config for lap - https://huggingface.co/raman-ai/tokengt-base-lap-pcqm4mv2 orf - https://huggingface.co/raman-ai/tokengt-base-orf-pcqm4mv2

Now, I am writing tests,

I tried to push some changes to PR But it says like authentication failed, do not have permission etc.

How should I push new commits to your PR? @clefourrier Need to add me as a collaborator to your forked repo

in my terminal

$ git remote -v
github-desktop-clefourrier      https://github.com/clefourrier/transformers.git (fetch)
github-desktop-clefourrier      https://github.com/clefourrier/transformers.git (push)
origin  https://github.com/Raman-Kumar/transformers.git (fetch)
origin  https://github.com/Raman-Kumar/transformers.git (push)
upstream        https://github.com/huggingface/transformers.git (fetch)
upstream        https://github.com/huggingface/transformers.git (push)
clefourrier commented 1 year ago

@Raman-Kumar added you to my fork!

Raman-Kumar commented 1 year ago

I created a new PR #22042 just for making a lot of commits and see where circleci do fail. So, I can correct it. Later I will do a single commit in your PR.

I have added a new dependency einops in setup.py. In entire repo, it's fist time being used in tokengt model.

I added TokenGTModelIntegrationTest. and now it passes all circleci checks.

I have a question. @clefourrier How to know the shape of inputs node_data,num_nodes,edge_index,edge_data,edge_num,in_degree,out_degree,lap_eigvec,lap_eigval,labels of Tokengt for ids_tensor() function?

Like in Graphormer

attn_bias = ids_tensor(
            [self.batch_size, self.graph_size + 1, self.graph_size + 1], self.num_atoms
        )  # Def not sure here
        attn_edge_type = ids_tensor([self.batch_size, self.graph_size, self.graph_size, 1], self.num_edges)
        spatial_pos = ids_tensor([self.batch_size, self.graph_size, self.graph_size], self.num_spatial)
        in_degree = ids_tensor([self.batch_size, self.graph_size], self.num_in_degree)
        out_degree = ids_tensor([self.batch_size, self.graph_size], self.num_out_degree)
        input_nodes = ids_tensor([self.batch_size, self.graph_size, 1], self.num_atoms)
        input_edges = ids_tensor(
            [self.batch_size, self.graph_size, self.graph_size, self.multi_hop_max_dist, 1], self.num_edges
        )
        labels = ids_tensor([self.batch_size], self.num_classes)
clefourrier commented 1 year ago

Ok, great for the PR, and congrats for the tests! For einops, do you need a lot of code? It would be better to copy paste the functions we will need (citing them and if the license allows ofc) as we only allow new dependencies for very specific cases.

For TokenGT, are you talking about the shape of inputs provided to the test suite? Most attributes will have the same shape as for Graphormer (batch_size in position one, then graph_size or linked to it for inputs which look over the whole graph, like those pertaining to edges/nodes (includes the degrees for example)). The collation function should be able to help you with the specifics, since the shape must be provided there. Last resort, to confirm your intuition, you can also print all the dimensions for the elements you want.

Amelie-Schreiber commented 11 months ago

What is the current status of TokenGT on Hugging Face? Is it possible to use this for token/node classification tasks? If so, could someone point me to a good starting point or example for figuring that out? I would love to try to use this on protein data through Hugging Face for node/token classification :)

clefourrier commented 11 months ago

Hi @Amelie-Schreiber ! Raman has been working on this integration in their spare time, but I don't think it's complete yet. One of the latest PRs was here if you want to take a look too :)

Raman-Kumar commented 11 months ago

Hey, I am resuming this. Lost touch for sometime time. Will further contribute to it.

@clefourrier May ask question, if stuck

clefourrier commented 11 months ago

Cool! Feel free to ask questions! I'm no longer actively working on graphs but I'll do my best to answer in reasonable delays.

trotsky1997 commented 7 months ago

How is it going now?it works?๐Ÿซฅ

Raman-Kumar commented 7 months ago

Thanks for the reminder @trotsky1997 I will complete this task now.

trotsky1997 commented 7 months ago

Thanks for the reminder @trotsky1997 I will complete this task now.

Coders Will happy to use this arch with Transformers:)

trotsky1997 commented 7 months ago

Thanks for the reminder @trotsky1997 I will complete this task now.

Hi, Can I use this model in HF now?

clefourrier commented 7 months ago

Hi @trotsky1997 , If you need it fast for a model of yours, you can port this architecture (the easiest would be to look at the Graphormer code and follow a similar design, using what's already available here), Once the model config is ported, you can then add it to your model' files, and load your models with use_remote_code=True.

For now however, it's not integrated in transformers.

@Raman-Kumar - please don't feel like you have to port this architecture. If you no longer want to do it because you don't have the time, that's completely fine! Other people in the community will likely be interested in helping with that.