scallop-lang / scallop

Framework and Language for Neurosymbolic Programming. Join Our Discord: https://discord.gg/RavzdND229
https://www.scallop-lang.org
MIT License
178 stars 9 forks source link

"Neurosymbolic" #20

Open brurucy opened 2 months ago

brurucy commented 2 months ago

Hi,

I'm curious as to why is scallop referred to as a "Framework and Language for Neurosymbolic Programming".

I understand "Neurosymbolic" as a system that cleverly bridges some connectivist system e.g a NN, to a reasoner and/or vice-versa. At first glance, this is what Scallop pitches itself to be.

When looking at the examples however, I do not see how Scallop is not just a regular Datalog reasoner with stratified negation, Python bindings, and extensible data types.

What makes Scallop clever? That is, what can it do, that any other Datalog reasoner over arbitrary semirings couldn't do?

Liby99 commented 2 months ago

Hi:

Thanks for the inquiry. TL;DR: Scallop is differentiable and comes with a set of novel differentiable provenances that are empirically proven effective. That is what makes Scallop "neuro-symbolic". This is not, to the best of my knowledge, trivially available in "any Datalog engine that supports arbitrary semiring".

First of all I would suggest reading through some of our papers (if you haven't), including our first few papers on the core of Scallop as well as a few papers on applying Scallop to domains of computer vision, NLP, and planning. For your information, we are continuing the pursuit of applying Scallop to various other domains, including increasingly complicated tasks in program analysis, medical, and bioinfo. All these applications require the infrastructure of Scallop that connects Neural components with symbolic domain in a deeply integrated manner--that is, the symbolic component being differentiable and gradients can flow back for proper training or fine-tuning of some neural models. We even have cases where the symbolic component written in Scallop carry trainable weights. In all these papers we write, such properties are treated as "neuro-symbolic".

As of right now, I acknowledge that many examples written on the website is toy-ish and seemingly just require a Datalog engine supporting arbitrary semiring. However, to its core Scallop is in fact "differentiable". Having Datalog supporting differentiable provenance is non-trivial, even for a Datalog engine supporting arbitrary semiring. This is particularly challenging since a Scallop module needs to be tightly integrated in PyTorch while supporting gradients flowing back-and-forth. We believe Scallop is one of the first to systematically develop such an infrastructure, not to mention such an extensible framework supporting customization of differentiable provenances, in order for the engine to be not only theoretically sound but also real-life applicable. Moreover, Scallop already comes with a library of such differentiable provenances so that the whole language is empirically proven applicable. Scallop's "differentiability", as well as the set of differentiable provenances that come with it, is what we are really claiming for Scallop to be neuro-symbolic. In fact, in our latest papers we are continuing to develop more intricate "differentiable provenances". I'm happy to discuss more if you are interested.

Last of all, I personally view neuro-symbolic as a very general term to describe any technique with a combination of neural and symbolic components. To be more specific, Scallop language is applicable to a subset of such techniques that uses the symbolic component to algorithmically supervise the underlying neural networks. To the best of my knowledge, similar techniques or languages involve DeepProbLog (and that series of work), TensorLog, NeuralASP, etc. While many details differ, these are all viewed as Neural-Symbolic. If you are curious about the general pitch, you can check out the mentioned works as well.

Please reach out if you have any further questions.