souffle-lang / souffle

Soufflé is a variant of Datalog for tool designers crafting analyses in Horn clauses. Soufflé synthesizes a native parallel C++ program from a logic specification.
http://souffle-lang.github.io/
Universal Permissive License v1.0
927 stars 208 forks source link

Why are we using MCPP? #943

Closed ghost closed 5 years ago

ghost commented 5 years ago

PR #241 replaced souffle-wave with souffle-mcpp, with the reasoning that "Using mcpp means better cross-compatibility, and works out of the box on several OSes.".

Replacing souffle-wave seems like a good thing, but I have found no other documented reason for why we are using MCPP.

While MCPP itself is consistent across platforms as a C preprocessor, it reduces the cross-platform compatibility of Souffle itself. Sure, apt-get install mcpp/brew install mcpp is easy, but not every system is Mac or Debian-based.

The dependency makes it difficult to use Souffle in certain environments. Issues #233, #352, #442, #504, #517, #537, #547, #708, #781, #828 each evidence this in some way. I have also experienced problems in multiple real-world scenarios where the dependency on MCPP has caused much pain.

Additionally, Doop, as a significant user of Souffle, runs whatever cpp command is on the path, before running Souffle itself. This means Doop's Datalog programs must be compatible with whatever C preprocessor is installed anyway. The generated C++ code, however, has MCPP run over it by souffle-compile -- so perhaps we are doing things there that do require the dependency.

I assume there is a good reason for using MCPP, otherwise we wouldn't have done it in the first place (I hope). If so, then this issue works as a great place to document why.

If not, lets get rid of the dependency on MCPP! Souffle can call whatever cpp is on the PATH, which could then be overridden by a SOUFFLE_CPP environment variable if people still want to use another C preprocessor.

b-scholz commented 5 years ago

Most systems have a cpp. However, clang/cpp only works for C/C++ languages well, and cannot be used for souffle.

To overcome this issue we tried Boost/Wave but we experienced problems with it and switched later to mcpp, which seems to be stable.

We could make souffle work in such a way where we check for the existence of mcpp and gcc/cpp at runtime. If one of them is installed, souffle can process the input (giving, for example, mccp the preference). Note this needs to be at runtime and cannot be hard-coded ahead of time if we want to make packages work with both versions of cpp.

The only issue could be that mcpp and gcc/cpp may behave slightly differently and users may have different user experiences.

mmcgr commented 5 years ago

Just to note, it's not just Debian-based linux. The Fedora family have mcpp, although old/restricted releases such as CentOS7, need a repo added. Our RPM repo includes mcpp to help with this. FreeBSD also has a port.

mmcgr commented 5 years ago

Given the different cpp behaviours, and that they are not essential to running souffle, we could make mcpp an optional dependency. Without mcpp you would have reduced functionality, rather than none.

ghost commented 5 years ago

Oh Clang, I had forgotten about Clang -- this makes perfect sense. So, from what I can tell, MCPP is required as a runtime dependency only if Clang is the only compiler available, and MCPP is never required at build time by configure or make. My proposal, then, is that we don't require MCPP as a dependency for building Souffle, as during build we do not compile any Datalog code. We then require "a compatible C preprocessor" as a runtime dependency, which may or may not be MCPP. I.e. we have a list of suitable C preprocessors sorted by preference, Souffle checks for each of them at run time and uses the first it finds, and MCPP has the highest preference. During packaging however, we should still depend on MCPP, so that release versions give a consistent user experience. This means developers can build and run Souffle on systems whether MCPP is installed or not, and users don't have issues relating to their specific platform.

ghost commented 5 years ago

All of that is invalid however, if the behaviour between different cpps is so substantial that it ends up with us having to maintain against each of them separately. If that's the case, the runtime dependency (not the build dependency) on MCPP is very much worthwhile.

b-scholz commented 5 years ago

That would make sense. We were quite generous with the dependencies in the configure script. We don't make a difference whether it is needed either at runtime or compile-time. However, when we run the tests (with make check) we definitely need mcpp.

As options I can see:

ghost commented 5 years ago

Option 1 seems like a fundamental change to the way Souffle operates, and would require all current users of Souffle to change how they are doing things. It would also be really quite annoying for users.

Option 2 I don't like in that it means we have to ensure compatibility with every cpp Souffle checks against. But, I think it is the closest of the three to a working solution.

Option 3 would be a bit of work, and we'd have to implement our own C preprocessor, or at least a fragment thereof. This completely negates the reason why we'd use a C preprocessor in the first place, and would be a massive undertaking, especially if we wanted to ensure compatibility and that all cpp-related stuff already in Souffle programs still works.

I have an option 4, which is like option 2, but is a lot less work.

Doing things this way I think addressees the other concerns.

mmcgr commented 5 years ago

We don't need any CPP, beyond comment removal, for most uses of souffle. If we add comment handling to the parser then mcpp/cpp are entirely optional.

Even #defines/#includes would require more substantial work, so I think that anything more complicated than comments should be handled by a PP.

b-scholz commented 5 years ago

@lyndonhenry: I quickly double-checked. We would need to remove the dependency in configure.ac and change the semantics in main.cpp to check for the existence of the the environment-variable SOUFFLE_CPP. I feel this can be done very quickly. @mmcgr: do you have any objections?

mmcgr commented 5 years ago

scanner.ll actually already supports gnu cpp line numbering

main.cpp would need to handle a string, rather than just a filename, to allow for arguments for the cpp of choice to be passed in. macros and error messages have potential to be problematic.

Otherwise it should work.

b-scholz commented 5 years ago

@lyndonhenry - do you want to make the change and pull request? It might be better that you do it so that it works in your current environment and we will double-check the rest.

ghost commented 5 years ago

I'd be happy to, but won't have time until next week. It is not urgent, so I am fine to wait until then.