Open Gori-LV opened 2 years ago
Yes, this is a great suggestion, but I doubt that this is possible, as most datasets are just downloaded from the official code repository which introduced the respective dataset. In same cases, a README.md
file exists, while this does not hold for others. As such, to fully understand a given dataset, it's best to look at the papers/code repositories linked to in the documentation. For example, for DBLP
, this is:
In the code, there exists a pre-processing script for the DBLP
dataset, see here.
📚 Documentation
I'm not sure this is the right tag but I would like ask if the bag-of-words mappings and more information on the semantic meaning of features could be provided for embedded datasets
TORCH_GEOMETRIC.DATASETS.DBLP
andTORCH_GEOMETRIC.DATASETS.IMDB
, say, maybe a mapping file or readme in downloaded raw/ folder or a link in the source code/doc? It would be super helpful if one wants to analyse the actual information that the models take in.For example, author nodes in the DBLP dataset adopt features of dimension 334 using bag-of-words, paper nodes and term nodes use 4231-dim and 50-dim features respectively. I'm curious what they stand for.
Many thanks!!