traindb-project / traindb-ml

Remote ML Model Serving Component for TrainDB
Apache License 2.0
6 stars 2 forks source link

analysis: Sum-Product Networks (SPN) for PyTorch #19

Open sungsoo opened 2 years ago

sungsoo commented 2 years ago

Sum-Product Networks (SPN) for PyTorch

Brief Summary

According to the Github of the SPFow, I think that SPFlow supports not only the Tensorflow framework, but also the PyTorch framework (torchspn). The implementation of the torchspn is already included in the current version of the SPFlow. In particular, the performance of the torchspn outperforms over native SPFlow and SPFlow-TF(tensorflow) in terms of the processing time.

So, we'd better review the source code for the torchspn for implementing unsupervised learning in approximate query processing.

Related Results

The following remarks referred to GitHub in SPFlow.

https://github.com/SPFlow/SPFlow/tree/master/src/spn/algorithms/layerwise

image

The example architecture above has been used to benchmark the runtime with varying number of input features (batch size = 1024) and varying batch size (number of input features = 1024).

The comparison is against a node-wise implementation of SPNs in SPFlow on the CPU and a node-wise implementation of SPNs in SPFlow on the GPU using Tensorflow.

kihyuk-nam commented 2 years ago

Two months ago I found it and tested a simple example in the SPFlow (the 'PyTorch test' in https://drive.google.com/file/d/1shAeYqXv7EwI6c1cEcALhIWAxcBEpP_P/view?usp=sharing), but I couldn't understand the reason why it's more efficient other than the use of tensors instead of numpy arrays. Maybe I should read the 'Einsum Networks' paper.

The size of codes that we need to modify seems relatively small but the logic to understand seems not easy :-)