transductive vs inductive

[EuroSys21] Accelerating graph sampling for graph machine learning using GPUs

Early algorithms, such as DeepWalk [28] and node2vec [12] employ shallow encodings. Given an input graph with n vertices and a target d-dimensional Euclidean space, a shallow encoding is a d × n matrix where the i th column contains the embedding of vertex v i . These algorithms are transductive: they take a static graph as input and produce embeddings only for the vertices in that graph. They typically adapt the Skip-Gram approach [24] to graphs, performing random walks to obtain context and target vertices.
More recent algorithms, including GraphSAGE [13], are inductive: they produce embeddings that generalize to previously unseen vertices. This property is particularly useful to build inference algorithms that work on dynamic, realworld graphs. Inductive algorithms learn a deep encoding, i.e., a function describing how to obtain a mapping, instead of the static map, and are also known as Graph Neural Networks (GNNs).

The research on GNNs is closely related to graph embedding or network embedding, another topic which attracts increasing attention from both the data mining and machine learning communities [10], [28], [29], [30], [31], [32].
Network embedding aims at representing network nodes as low-dimensional vector representations, preserving both network topology structure and node content information, so that any subsequent graph analytics task such as classification, clustering, and recommendation can be easily performed using simple off-the-shelf machine learning algorithms (e.g., support vector machines for classification).
Meanwhile, GNNs are deep learning models aiming at addressing graph-related tasks in an end-to-end manner. Many GNNs explicitly extract high-level representations.

inductive: {( 𝑥1:𝑙 , 𝑦1:𝑙 ), 𝑥𝑙+1:𝑙+𝑢 , 𝑥𝑡𝑒𝑠𝑡 }
transductive: {( 𝑥1:𝑙 , 𝑦1:𝑙 ), 𝑥𝑙+1:𝑙+𝑢 } Inductive aims learn a general model from ( 𝑥1:𝑙 , 𝑦1:𝑙 ) and 𝑥𝑙+1:𝑙+𝑢 , and then predict labels on 𝑥𝑡𝑒𝑠𝑡. Transductive aims to directly predict labels on 𝑥𝑙+1:𝑙+𝑢 based on the given labeled set ( 𝑥1:𝑙 , 𝑦1:𝑙 ).

In ML literature, predicting labels for a given and unlabeled test (target) data is called the transductive setting [29, 76]. Transductive data collection can leverage the information from the unlabeled target data to acquire more valuable labels.
This differs from the inductive setting more commonly used in academia, where the test data is only used during evaluation.
Evaluating ML models in the transductive setting needs to be careful and we describe the protocol for making fair comparisons between transductive algorithms in Section 7.1.