Closed ghost closed 5 years ago
Most systems have a cpp. However, clang/cpp only works for C/C++ languages well, and cannot be used for souffle.
To overcome this issue we tried Boost/Wave but we experienced problems with it and switched later to mcpp, which seems to be stable.
We could make souffle work in such a way where we check for the existence of mcpp and gcc/cpp at runtime. If one of them is installed, souffle can process the input (giving, for example, mccp the preference). Note this needs to be at runtime and cannot be hard-coded ahead of time if we want to make packages work with both versions of cpp.
The only issue could be that mcpp and gcc/cpp may behave slightly differently and users may have different user experiences.
Just to note, it's not just Debian-based linux. The Fedora family have mcpp, although old/restricted releases such as CentOS7, need a repo added. Our RPM repo includes mcpp to help with this. FreeBSD also has a port.
Given the different cpp behaviours, and that they are not essential to running souffle, we could make mcpp an optional dependency. Without mcpp you would have reduced functionality, rather than none.
Oh Clang, I had forgotten about Clang -- this makes perfect sense. So, from what I can tell, MCPP is required as a runtime dependency only if Clang is the only compiler available, and MCPP is never required at build time by configure or make. My proposal, then, is that we don't require MCPP as a dependency for building Souffle, as during build we do not compile any Datalog code. We then require "a compatible C preprocessor" as a runtime dependency, which may or may not be MCPP. I.e. we have a list of suitable C preprocessors sorted by preference, Souffle checks for each of them at run time and uses the first it finds, and MCPP has the highest preference. During packaging however, we should still depend on MCPP, so that release versions give a consistent user experience. This means developers can build and run Souffle on systems whether MCPP is installed or not, and users don't have issues relating to their specific platform.
All of that is invalid however, if the behaviour between different cpps is so substantial that it ends up with us having to maintain against each of them separately. If that's the case, the runtime dependency (not the build dependency) on MCPP is very much worthwhile.
That would make sense. We were quite generous with the dependencies in the configure script.
We don't make a difference whether it is needed either at runtime or compile-time.
However, when we run the tests (with make check
) we definitely need mcpp.
As options I can see:
We remove mcpp from souffle and it is up to the user to pre-process the source-code with their preferred cpp. (NB: this is not very user-oriented).
We search at runtime for various cpp and adjust parameters depending on which cpp and which version is found (NB: painful for us).
We extend the lexer so that it can do very basic preprocessing (simple notions of #include/#define). This would require extra work but we would get rid of our dependencies.
Option 1 seems like a fundamental change to the way Souffle operates, and would require all current users of Souffle to change how they are doing things. It would also be really quite annoying for users.
Option 2 I don't like in that it means we have to ensure compatibility with every cpp Souffle checks against. But, I think it is the closest of the three to a working solution.
Option 3 would be a bit of work, and we'd have to implement our own C preprocessor, or at least a fragment thereof. This completely negates the reason why we'd use a C preprocessor in the first place, and would be a massive undertaking, especially if we wanted to ensure compatibility and that all cpp-related stuff already in Souffle programs still works.
I have an option 4, which is like option 2, but is a lot less work.
configure
, make
, or make check
. SOUFFLE_CPP
.SOUFFLE_CPP
is not defined, which it is not by default, Souffle uses MCPP.Doing things this way I think addressees the other concerns.
We don't need any CPP, beyond comment removal, for most uses of souffle. If we add comment handling to the parser then mcpp/cpp are entirely optional.
Even #defines/#includes would require more substantial work, so I think that anything more complicated than comments should be handled by a PP.
@lyndonhenry: I quickly double-checked. We would need to remove the dependency in configure.ac
and change the semantics in main.cpp to check for the existence of the the environment-variable SOUFFLE_CPP
. I feel this can be done very quickly.
@mmcgr: do you have any objections?
scanner.ll actually already supports gnu cpp line numbering
main.cpp would need to handle a string, rather than just a filename, to allow for arguments for the cpp of choice to be passed in. macros and error messages have potential to be problematic.
Otherwise it should work.
@lyndonhenry - do you want to make the change and pull request? It might be better that you do it so that it works in your current environment and we will double-check the rest.
I'd be happy to, but won't have time until next week. It is not urgent, so I am fine to wait until then.
PR #241 replaced souffle-wave with souffle-mcpp, with the reasoning that "Using mcpp means better cross-compatibility, and works out of the box on several OSes.".
Replacing souffle-wave seems like a good thing, but I have found no other documented reason for why we are using MCPP.
While MCPP itself is consistent across platforms as a C preprocessor, it reduces the cross-platform compatibility of Souffle itself. Sure,
apt-get install mcpp/brew install mcpp
is easy, but not every system is Mac or Debian-based.The dependency makes it difficult to use Souffle in certain environments. Issues #233, #352, #442, #504, #517, #537, #547, #708, #781, #828 each evidence this in some way. I have also experienced problems in multiple real-world scenarios where the dependency on MCPP has caused much pain.
Additionally, Doop, as a significant user of Souffle, runs whatever
cpp
command is on the path, before running Souffle itself. This means Doop's Datalog programs must be compatible with whatever C preprocessor is installed anyway. The generated C++ code, however, has MCPP run over it by souffle-compile -- so perhaps we are doing things there that do require the dependency.I assume there is a good reason for using MCPP, otherwise we wouldn't have done it in the first place (I hope). If so, then this issue works as a great place to document why.
If not, lets get rid of the dependency on MCPP! Souffle can call whatever
cpp
is on the PATH, which could then be overridden by aSOUFFLE_CPP
environment variable if people still want to use another C preprocessor.