Bag-of-words mapping for datasets

📚 Documentation

I'm not sure this is the right tag but I would like ask if the bag-of-words mappings and more information on the semantic meaning of features could be provided for embedded datasets TORCH_GEOMETRIC.DATASETS.DBLP and TORCH_GEOMETRIC.DATASETS.IMDB, say, maybe a mapping file or readme in downloaded raw/ folder or a link in the source code/doc? It would be super helpful if one wants to analyse the actual information that the models take in.

For example, author nodes in the DBLP dataset adopt features of dimension 334 using bag-of-words, paper nodes and term nodes use 4231-dim and 50-dim features respectively. I'm curious what they stand for.

Many thanks!!

HeteroData(
  author={
    x=[4057, 334],
    y=[4057],
    train_mask=[4057],
    val_mask=[4057],
    test_mask=[4057]
  },
  paper={ x=[14328, 4231] },
  term={ x=[7723, 50] },
  conference={ num_nodes=20 },
  (author, to, paper)={ edge_index=[2, 19645] },
  (paper, to, author)={ edge_index=[2, 19645] },
  (paper, to, term)={ edge_index=[2, 85810] },
  (paper, to, conference)={ edge_index=[2, 14328] },
  (term, to, paper)={ edge_index=[2, 85810] },
  (conference, to, paper)={ edge_index=[2, 14328] }
)

pyg-team / pytorch_geometric

Bag-of-words mapping for datasets #3728

📚 Documentation