llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

http://llvm.org

Other

28.69k stars 11.87k forks source link

LLVM exposes configure pre-processor #defines #552

Closed llvmbot closed 14 years ago

llvmbot commented 20 years ago


Bugzilla Link	180
Resolution	FIXED
Resolved on	Mar 06, 2010 13:59
Version	1.0
OS	All
Reporter	LLVM Bugzilla Contributor
CC	@lattner

Extended Description

Currently, LLVM exposes many pre-processor symbols defined during the configure processing through include/Config/config.h. So far, I have detected the following #defines exposed through public headers:

PACKAGE_NAME PACKAGE_RELEASE PACKAGE_BUGRELEASE PACKAGE_STRING PACKAGE_TARNAME

There are also numerous HAS_XXX symbols polluting the pre-processor namespace.

The existence of these symbols in public headers makes integration of LLVM with other software that uses autoconf (like mine) awkward at best and impossible at worst.

The main culprit seems to be include/Support/DataTypes.h which directly include "Config/config.h" which is the file that defines all the symbols.

To fix this, it will be necessary to remove any #includes of "Config/config.h" in public header files to either the .cpp files or internal header files.

I'm going to go ahead and make the necessary changes to eliminate the problem but wanted to get a little feedback on this before submitting patches.

llvmbot commented 20 years ago

I think this is taken care of now (finally!)

There were quite a few patches, starting with: http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20040223/012085.html ending with: http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20040223/012117.html (not all of those are pertinent, but they are all in that range.)

lattner commented 20 years ago

I think that the only thing holding this bug up is the fact that Support/DataTypes.h, Support/hash_map, Support/hash_set, and Support/iterator

include config.h directly.

Brian mentioned that the right way to fix this was to autoconf these files directly, instead of having them use config.h

Whenever he (or someone else) wants to do this, we can finally close this. :)

-Chris

lattner commented 20 years ago

The doxygen configuration file can be set up to control exactly what gets generated. One of the most useful things I find in doxygen is the ability to look at the source code via a web browser without any comments. Just pure code. As for what gets exposed and what doesn't, that's completely up to the LLVM implementors. Doxygen is just a tool and like any tool what you get out of it depends on how you use it. Getting the right results requires a bit of time and investment. Personally, I think doxygen is wonderful :)

Ok, well since I don't really use doxygen much, it doesn't make sense for me to be the one to configure it. :) If you have any suggestions on our config, I would be happy to listen. The doxygen.cfg file is in ~/llvm/docs/doxygen.cfg.

As I go along and start to understand more about LLVM, I'll try to contribute doxygen comments to describe things and submit the patches.

Cool. When I said "half way documented" before, I meant "half way doxygened". :) While the documentation in LLVM could certainly be improved, I also don't think it's completely terrible. I should probably make some time to write some more "high-level" documentation, but that effort got stalled by the "convert everything to docbook" push. :)

-Chris

llvmbot commented 20 years ago

It seems like there are three camps:

users (just want to write a compiler and use LLVM facilities)

extenders (want to extend LLVM framework .. new passes, etc. )

developers (changing/extending core LLVM abstractions)

It might be useful to reflect these three usage categories in the public APIs of LLVM.

That's a reasonable division of users. The problem is that we really don't know where to draw hard lines in the sand to split up the headers, for example. I don't really think there is a good way to do this.

Understood.

It might just be a side-effect of LLVM only being half-way documented, but I find the actual header files to be much more useful than the doxygen output. There are a lot of comments and description that don't make it into the doxygen, though we are incrementally converting most of it over.

Outside of LLVM, I generally don't find doxygen helpful for the exact same reason that I don't like headers in source directories. For example, if you go to this page (or one of its analogs): http://llvm.cs.uiuc.edu/doxygen/annotated.html

... you get a list of every class in LLVM. This generally isn't helpful at all, because it exposes WAY too much implementation detail. Generally most people only want various levels of the stuff we expose in the include hierarchy. If they get interested in a particular piece, then they can dig into the implementation.

The doxygen configuration file can be set up to control exactly what gets generated. One of the most useful things I find in doxygen is the ability to look at the source code via a web browser without any comments. Just pure code. As for what gets exposed and what doesn't, that's completely up to the LLVM implementors. Doxygen is just a tool and like any tool what you get out of it depends on how you use it. Getting the right results requires a bit of time and investment. Personally, I think doxygen is wonderful :)

As I go along and start to understand more about LLVM, I'll try to contribute doxygen comments to describe things and submit the patches.

That said, splitting header and source is obviously only a partial implementation. Perhaps the ideal (though practically unattainable) way to do this is to use some sort of literate programming methodology with incremental levels of details exposed or something.

At some point, writing a book is probably the easiest way to go :)

Yup, I agree there.

Furthermore, users are going to be more interested in the installed hierarchy. LLVM doesn't support install yet, but it should and it will come for free with automake. Since only the public headers would get copied on install, it accomplishes what you're attempting to achieve without separating the files.

We do have a prototype install target that Brian is working on, but I understand your point. The problem with having automake rip them out, though, is that you have to test both the 'local' and 'installed' configs seperately, and you have to maintain the list of what to copy somewhere.

In my experience, very few things go wrong between a successful "make check" and "make install". However, automake handles this too. Two built-in targets provided by automake check both your distribution and your installation. The "distcheck" target will build a source distribution (i.e. llvm-1.x.tar.gz), unpack it, build it, and check it. The "installcheck" target will run all automake tests but use the installed version of the programs and libraries instead of the local version. But, I digress .. promised not to talk about automake here :)

lattner commented 20 years ago

Ah ok. How do you deal with the problem? I generally just pick an implementation (usually gnu) and if the platform doesn't support it, I port it myself.

Ok, but there is no standard, even among the GNU compilers. In fact we only support the GNU compilers, but there are already 3 different combination of

include file locations and namespace locations...

The problem is that there are multiple "kinds" of users of LLVM. Yes, that makes sense .. multiple audiences. It seems like there are three camps:

users (just want to write a compiler and use LLVM facilities)

extenders (want to extend LLVM framework .. new passes, etc. )

developers (changing/extending core LLVM abstractions)

It might be useful to reflect these three usage categories in the public APIs of LLVM.

That's a reasonable division of users. The problem is that we really don't know where to draw hard lines in the sand to split up the headers, for example. I don't really think there is a good way to do this.

Just one point: the "simple to find" is handled by doxygen. Frankly, I would rather look at doxygen than raw header files.

It might just be a side-effect of LLVM only being half-way documented, but I find the actual header files to be much more useful than the doxygen output. There are a lot of comments and description that don't make it into the doxygen, though we are incrementally converting most of it over.

Outside of LLVM, I generally don't find doxygen helpful for the exact same reason that I don't like headers in source directories. For example, if you go to this page (or one of its analogs): http://llvm.cs.uiuc.edu/doxygen/annotated.html

... you get a list of every class in LLVM. This generally isn't helpful at all, because it exposes WAY too much implementation detail. Generally most people only want various levels of the stuff we expose in the include hierarchy. If they get interested in a particular piece, then they can dig into the implementation. That said, splitting header and source is obviously only a partial implementation. Perhaps the ideal (though practically unattainable) way to do this is to use some sort of literate programming methodology with incremental levels of details exposed or something.

At some point, writing a book is probably the easiest way to go :)

Furthermore, users are going to be more interested in the installed hierarchy. LLVM doesn't support install yet, but it should and it will come for free with automake. Since only the public headers would get copied on install, it accomplishes what you're attempting to achieve without separating the files.

We do have a prototype install target that Brian is working on, but I understand your point. The problem with having automake rip them out, though, is that you have to test both the 'local' and 'installed' configs seperately, and you have to maintain the list of what to copy somewhere.

I don't really understand what you're saying here. What "standard" implementation of hash_map are you talking about?

I meant the std and gnu versions as "standard". That is, we should use code that's already written, not hack something up that is specific to LLVM.

Oh, yeah, that's exactly what I meant. I'm not interested in the NIH syndrome, and also not interested in working on implementing/learning a new hash_map. ;)

Oh, anything is possible with autoconf, its just a matter of how much work you want to work for it.

Ok, I will leave it to you guru's to figure out how this should work. :)

-Chris

llvmbot commented 20 years ago

Of course I have. What I didn't run into was, again, your mechanism for disambiguating. Man, I'm starting to regret having made that comment! :)

Ah ok. How do you deal with the problem?

I generally just pick an implementation (usually gnu) and if the platform doesn't support it, I port it myself.

I suppose that one alternative would be to clone a licence compatible implementation of hash_map into the LLVM world. That just sounds like busy work to me. It is busy work, but it does sound like a solution to this problem, correct?

Yes, it is one solution.

The problem is that there are multiple "kinds" of users of LLVM. There are people like you who basically just need include/llvm/* (no subdirs) to generate code. On the other hand, there are those who are writing analyses and transformations outside of the LLVM tree, but whose code could be considered "part of" LLVM. These people need access to important LLVM API's like AliasSetTracker so that they can implement their transformation or whatever making use of the infrastructure we have.

Yes, that makes sense .. multiple audiences. It seems like there are three camps:

users (just want to write a compiler and use LLVM facilities)
extenders (want to extend LLVM framework .. new passes, etc. )
developers (changing/extending core LLVM abstractions)

It might be useful to reflect these three usage categories in the public APIs of LLVM.

The problem is that when you start diving into a large source base (like LLVM is slowly growing into), you really want to be shielded from as much detail as possible. This means that you want to public APIs to be as simple to find as possible. The other advantage of separating them out is it makes it extremely clear what is an "internal" and "external" header file. That said, we probably shouldn't get into this discussion now, as it will only dillute the real point of this bug.

Just one point: the "simple to find" is handled by doxygen. Frankly, I would rather look at doxygen than raw header files. That being the case, it doesn't really matter where those header files live in the hierarchy. Furthermore, users are going to be more interested in the installed hierarchy. LLVM doesn't support install yet, but it should and it will come for free with automake. Since only the public headers would get copied on install, it accomplishes what you're attempting to achieve without separating the files.

Incorporating a licence compatible implementation sounds reasonable but it is also just busy work. It will impact the source base somehow and doesn't add any value. My first preference would be to keep using the "standard" implementations of hash_map and solve the HAVE problem. It turns out that several of the #defines defined by LLVM's config.h don't conflict with mine because they utilize LLVM specific tests.

I don't really understand what you're saying here. What "standard" implementation of hash_map are you talking about?

I meant the std and gnu versions as "standard". That is, we should use code that's already written, not hack something up that is specific to LLVM.

The whole reason we have the header is that there are no standard versions...

I was using the term "standard" lightly (see the quotes?). In my view there are two defacto "standard" versions: std and gnu. They are the defacto standards because these are the ones that get used most often.

Also, just because many LLVM config.h #defines happen to not conflict with yours doesn't mean that new ones won't in the future.

True 'nuff.

You've successfully convinced me that we have to get config.h out of the public headers.

Yes.

Keep "internal" and "external" configuration #defines in separate .h files. The usual HAVE_STRING_H, HAVE_XXX will be in the usual "config.h" for compatibility with other common practices. The "external" configuration

defines are likely to not conflict.

I thought that this wasn't possible with autoconf?

Oh, anything is possible with autoconf, its just a matter of how much work you want to work for it. I was originally looking for the autoheader/autoconf command line option that would take care of it. Such a beast doesn't exist. The long explanation I gave in my last message is how to do it the hard way. Keeping separate config files is pretty straight forward (one extra line in configure.ac). The hard part is reworking all the m4 macros that generate ACDEFINE(HAVE) calls to use AC_DEFINE(LLVMHAS) calls. The reason for keeping both an internal and external config.h is to reduce the amount of m4 work. The internal ones can just stay HAVE_ while we have to do the m4 work to get the external ones to use LLVMHAS*

Reid

lattner commented 20 years ago

Of course I have. What I didn't run into was, again, your mechanism for disambiguating. Man, I'm starting to regret having made that comment! :)

Ah ok. How do you deal with the problem?

I suppose that one alternative would be to clone a licence compatible implementation of hash_map into the LLVM world.

That just sounds like busy work to me.

It is busy work, but it does sound like a solution to this problem, correct?

there are also pragmatic concerns such as how to efficiently implement light-weight classes like AliasSetTracker which absolutely need certain methods to be inlined for decent performance.

Well, I think this underscores my point again. Why does AliasSetTracker.h need to be part of the public interface? I don't know much about this file and its contents but a cursory look at it suggests its an implementation detail. Frankly, as a user of LLVM, I don't really care how LLVM tracks aliases just so long as it does. I don't ever see myself subclassing a new way to do alias tracking. Is this really necessary to be exposed in the public interface? If it is, I'll take your word for it ... it just doesn't seem that way to me.

The problem is that there are multiple "kinds" of users of LLVM. There are people like you who basically just need include/llvm/* (no subdirs) to generate code. On the other hand, there are those who are writing analyses and transformations outside of the LLVM tree, but whose code could be considered "part of" LLVM. These people need access to important LLVM API's like AliasSetTracker so that they can implement their transformation or whatever making use of the infrastructure we have.

I understand your reasoning, but frankly, that choice really bugs me. In my view, headers and the related class files should live in the same directory.

I understand what you're saying, and FWIW, Vikram shares your view. The problem is that when you start diving into a large source base (like LLVM is slowly growing into), you really want to be shielded from as much detail as possible. This means that you want to public APIs to be as simple to find as possible. The other advantage of separating them out is it makes it extremely clear what is an "internal" and "external" header file. That said, we probably shouldn't get into this discussion now, as it will only dillute the real point of this bug.

Incorporating a licence compatible implementation sounds reasonable but it is also just busy work. It will impact the source base somehow and doesn't add any value. My first preference would be to keep using the "standard" implementations of hash_map and solve the HAVE problem. It turns out that several of the #defines defined by LLVM's config.h don't conflict with mine because they utilize LLVM specific tests.

I don't really understand what you're saying here. What "standard" implementation of hash_map are you talking about? The whole reason we have the header is that there are no standard versions... Also, just because many LLVM config.h #defines happen to not conflict with yours doesn't mean that new ones won't in the future. You've successfully convinced me that we have to get config.h out of the public headers.

Keep "internal" and "external" configuration #defines in separate .h files. The usual HAVE_STRING_H, HAVE_XXX will be in the usual "config.h" for compatibility with other common practices. The "external" configuration

defines are likely to not conflict.

I thought that this wasn't possible with autoconf?

-Chris

llvmbot commented 20 years ago

Before anyone goes way out of their way to work on this, I'd like to note that we're considering moving llvm/test/Programs into a separate CVS module and a separate tarball. The reasons for this are:

1) Remove llvm/test/Programs from LLVM's autoconf configuration. The llvm/test/Programs is kind of its own world away from llvm/*, and it uses different build rules to build everything.

2) Reduce the size of LLVM proper. We want to add a lot more programs to the test suite, and that will balloon the size of the LLVM distribution. Moving it out keeps the download and updates of LLVM small, and allows people to skip downloading tons of programs if they are not interested.

So, for anyone considering massive changes to the build system, don't worry about getting everything under llvm/test/Programs working yet; you may not have to do it, depending upon what we eventually decide on the issue.

llvmbot commented 20 years ago

I'm still suprised that you haven't run into problems using hash_map with multiple compilers.

Of course I have. What I didn't run into was, again, your mechanism for disambiguating. Man, I'm starting to regret having made that comment! :)

I wish that the performance difference was purely imagined so that we could just go ahead and use std::map for everything. :(

No, DON'T do that! :)

I suppose that one alternative would be to clone a licence compatible implementation of hash_map into the LLVM world. This would guarantee a standard interface, and allow smooth migration to the C++0x implementation whenever it materializes. That said, it's a nontrivial amount of work to do this right.

That just sounds like busy work to me.

there are also pragmatic concerns such as how to efficiently implement light-weight classes like AliasSetTracker which absolutely need certain methods to be inlined for decent performance.

Well, I think this underscores my point again. Why does AliasSetTracker.h need to be part of the public interface? I don't know much about this file and its contents but a cursory look at it suggests its an implementation detail. Frankly, as a user of LLVM, I don't really care how LLVM tracks aliases just so long as it does. I don't ever see myself subclassing a new way to do alias tracking. Is this really necessary to be exposed in the public interface? If it is, I'll take your word for it ... it just doesn't seem that way to me.

In a world where compilers like G++ dominate the world, using better encapsulation is really not an option.

Well, that's true.

That said, gratuitously exposing internal structure is obviously bad. I keep firm pressure on our developers to reduce the amount of #includage in headers as much as possible.

I know you do, my comments aren't a complaint against LLVM .. just trying to make it perfect :)

When designing a library interface, very much care must be taken in what you expose to the library user.

Absolutely. FWIW, this is why we separated out the public interface (llvm/include) from the implementations and the implementation headers.

I understand your reasoning, but frankly, that choice really bugs me. In my view, headers and the related class files should live in the same directory. Where the files end up after installation is something for the makefiles to handle. My view is probably derived from my heavy use of automake which allows me to easily assemble any #include hierarchy I want after installation. This way you can have the best of both worlds: header files go in the natural place for developers (near the related .cpp files) and go in the "right" place for users (in some include directory in the install target).

The only solution I've seen is to not expose config.h and I believe that is the RightThing(tm) to do, even if it isn't the easy thing.

Ok, ok! I believe you! :) The only question is: how?

Alright, I'll stop harping on this issue :)

FWIW, I usually copy the ,v files, then cvs rm the old location. I think this is the preferred way to "move" files in CVS, do you concur?

Yes, I concur. I guess I was thinking of "move" figuratively more than literally.

Here are some proposed solutions:

I've been successfully convinced that we absolutely need to move the include/Support directory into include/llvm/Support. This is entirely mechanical, but should happen after the 1.1 release (this is Bug 183).

Okay.

hash_map/hash_set are tremendous problems for us in so many ways it's not even funny. The only realistic solution that I can think of to this problem is inelegant, but will solve the problem just fine: invent LLVM specific hashed containers which expose the same interface as the "stl" ones, but are in a fixed location for LLVM. These would presumably live in the support library. The only trick to this is that we do not want to invent our own containers. Instead we should find some license compatible implementations with few dependencies and snarf them into our sourcebase. :)

Incorporating a licence compatible implementation sounds reasonable but it is also just busy work. It will impact the source base somehow and doesn't add any value. My first preference would be to keep using the "standard" implementations of hash_map and solve the HAVE problem. It turns out that several of the

defines defined by LLVM's config.h don't conflict with mine because they

utilize LLVM specific tests.

To assist with this problem, here's what I've come up with:

Keep "internal" and "external" configuration #defines in separate .h files. The usual HAVE_STRING_H, HAVE_XXX will be in the usual "config.h" for compatibility with other common practices. The "external" configuration

defines are likely to not conflict.
Rename all the "external" HAVE_ variables to LLVMHAS variables. This is the naming practice used by packages like ACE (albeit, it doesn't use autoconf). These variables, since they are unique to LLVM won't class. To support this we just need to change a few AC_DEFINE calls in configure.ac and acincludes.m4 (I separate out the LLVM specific stuff from aclocal generated stuff, aclocal.m4, in my source tree).
Adjust the #includage to ensure that the internal config.h is not exposed publicly.
Either don't use autoheader and maintain the llvm/internal/config.h.in and include/llvm/config.h.in files manually, OR write a shell script to be run immediately after autoheader that moves the #defines starting with LLVMHAS from llvm/internal/config.h.in to include/llvm/config.h.in. This latter approach is more work up front but could be significantly less work as LLVM matures and requires addtional configuration variables.

So, to summarize, we need in configure.ac:

AC_CONFIG_HEADERS(include/internal/config.h) # internal config header AC_CONFIG_HEADERS(include/llvm/config.h) # external config header

And we need to change HAVE_DLOPEN, HAVE_MALLINFO, HAVE_PTHREAD_MUTEX_LOCK (all set via AC_SEARCH_LIBS) to be renamed as LLVM_HAS_DLOPEN, LLVM_HAS_MALLINFO, LLVM_HAS_PTHREAD_MUTEX_LOCK, respetively.

In the acinclude.m4 (aclocal.m4) files we need to change every occurrence of "ACDEFINE(HAVE" to "AC_DEFINE(LLVMHAS". This will affect symbols like HAVE_NAMESPACES, HAVE_STD_EXT_HASH_MAP, etc.

If we want to use autoheader, we need a script to move the LLMVHAS #undef lines so that:

in llvm/internal/config.h there should only be "#undef HAVE_*" lines.
in llvm/config.h there should only be "#undef LLVMHAS*" lines.

This approach should take care of the problem.

Comments?

This leaves three headers:

Support/slist (which is dead, and I'm deleting now)

Okay :)

Support/iterator, which we already have "local" implementations of classes we need, so it just needs to be "cleaned up"

Okay :)

Support/DataTypes.h - This is the hard one, see below.

Implementing Support/DataTypes.h without gross hacks is going to be difficult. I believe the reason that we have our own int64_t and uint64_t implementations is that Sparc doesn't have a header, and probably other older platforms don't either. These data types are going to be needed by headers in the public interfaces, and the only good way that I know of to implement this is with autoconf. Suggestions please!

The only thing I can think of is to create a #define, LLVM_UINT64_T, that equates to the correct typename. The value is created either directly by autoconf or via autoconf configured header files. On most modern platforms, LLVM_UNIT64_T will just be uint64_t but if you need to implement a class to support the type then it would be that class.

Please note that Solaris has its own uint64_t and int64_t definitions, just not in .

The other purpose of Support/DataTypes.h is for endianness detection. This really really has absolutely no business affecting the public api of the LLVM headers, so should be moved into a private header somewhere or something. The llvm/Bytecode/Primitives.h header (the only user of endianness stuff) deserves its own bug, which I'll file shortly (it will probably end up as Bug 184). It's been on my hitlist for a long time now, I have just never gotten around to it.

Agreed.

We have to decide how to represent headers which are public to global LLVM implementation code, but private to the rest of the world (a place to put headers such as config.h and the Endianness stuff, for example). I don't have a wonderful answer to this problem, but it doesn't seem too difficult.

The typical practice is to put them in include/llvm/internal. The internal header files are actually installed publically with the rest of them but anyone

including them or relying on their contents remaining compatible with the

library's interface assumes it at their own risk. This works well because developers of the library have a standard way to get an LLVM wide header file that by virtue of its name is a warning to users of the library:

include <llvm/internal/MyInternalHeader.h>

Additionally, a directory named "experimental" or "notsupported" is common for those purposes.

What do you all think? This sounds like something that can be approached incrementally, one piece at a time... if we agree on approach, the individual pieces can be filed as their own bugs and this bug can depend on them.

The incremental approach is fine so long as the source base keeps on compiling after each commit. You guys are very prolific and I have a cron job that updates and recompiles every day. So far, its never failed in any big way. I wouldn't want to see that change :)

Because we aren't doing any integration, we will have to rely on you Reid, to let us know how we are doing and how close we are. :)

You'll be the first to know. This is very high on my priority list. I'm trying to get my source base LLVMized by the end of the year so XPL can be compiled to bytecode. Right now I'm blocked because of these issues. Fortunately, there's stuff I can work on around the edges but I'll test the changes you submit as soon as I notice them.

lattner commented 20 years ago

Here are some proposed solutions:

I've been successfully convinced that we absolutely need to move the include/Support directory into include/llvm/Support. This is entirely mechanical, but should happen after the 1.1 release (this is Bug 183).
hash_map/hash_set are tremendous problems for us in so many ways it's not even funny. The only realistic solution that I can think of to this problem is inelegant, but will solve the problem just fine: invent LLVM specific hashed containers which expose the same interface as the "stl" ones, but are in a fixed location for LLVM. These would presumably live in the support library. The only trick to this is that we do not want to invent our own containers. Instead we should find some license compatible implementations with few dependencies and snarf them into our sourcebase. :)
This leaves three headers:
- Support/slist (which is dead, and I'm deleting now)
- Support/iterator, which we already have "local" implementations of classes we need, so it just needs to be "cleaned up"
- Support/DataTypes.h - This is the hard one, see below.
Implementing Support/DataTypes.h without gross hacks is going to be difficult. I believe the reason that we have our own int64_t and uint64_t implementations is that Sparc doesn't have a header, and probably other older platforms don't either. These data types are going to be needed by headers in the public interfaces, and the only good way that I know of to implement this is with autoconf. Suggestions please!
The other purpose of Support/DataTypes.h is for endianness detection. This really really has absolutely no business affecting the public api of the LLVM headers, so should be moved into a private header somewhere or something. The llvm/Bytecode/Primitives.h header (the only user of endianness stuff) deserves its own bug, which I'll file shortly (it will probably end up as Bug 184). It's been on my hitlist for a long time now, I have just never gotten around to it.
We have to decide how to represent headers which are public to global LLVM implementation code, but private to the rest of the world (a place to put headers such as config.h and the Endianness stuff, for example). I don't have a wonderful answer to this problem, but it doesn't seem too difficult.

What do you all think? This sounds like something that can be approached incrementally, one piece at a time... if we agree on approach, the individual pieces can be filed as their own bugs and this bug can depend on them. Because we aren't doing any integration, we will have to rely on you Reid, to let us know how we are doing and how close we are. :)

-Chris

lattner commented 20 years ago

My comment was not directed at the hash_map itself but at the mechanism by which LLVM disambiguates the multiple implementations of it (y'know, the nice thing about standards is there's so many to choose from :). The mechanism is what I was suggesting be sent to the standards groups. However, the use of the

Ok, so the real question is: what can we do about this? How can we fix this in a way that works for LLVM without imposing structure on clients of LLVM headers?

However, without using a pimpl or something, I don't know how we can hide this from clients. C++ just wasn't designed for this form of data hiding. There are two design choices that support this kind of hiding:

Forward declarations of internal types

Inheritance and/or Polymorphism.

The former has the side effect that you can only define a private member as a pointer to the forward declared type. This implies a memory allocation which in many instances is undesireable.

I strongly urge people to forward declare as many classes as possible. In my mind, this is just one part of good class design. That said, if you want to contain something by value, then you need its definition. In addition to the extra memory allocation, this also implies that simple accessors and such cannot be inlined by non-smart compilers... This can be much more overhead than the allocation.

The latter avoids the memory allocation but requires that the user invoke some kind of special allocation function to instantiate the actual (subclass) type. LLVM already supports this approach well for Passes. I suggest it use the technique on a more broad scale.

That fine for course-grained classes like LLVM passes, but it doesn't work for fine-grained classes like hash_map. This design technique requires that all methods be made into virtual functions, which is totally unacceptable for something a simple as ".empty()" or ".begin()". Also, this falls down when you try to expose (possibly const) iterators into the underlying type.

Please, I've been using hash_map since the early 1990s. I'm well versed with it. My comment (which I now see was not stated clearly) was directed at the mechanism LLVM uses to disambiguate the various hash_map implementations. See my comment above to Brian.

Sorry about that. I got the wrong impression from your comment. That said, I'm still suprised that you haven't run into problems using hash_map with multiple compilers. I wish that the performance difference was purely imagined so that we could just go ahead and use std::map for everything. :( I suppose that one alternative would be to clone a licence compatible implementation of hash_map into the LLVM world. This would guarantee a standard interface, and allow smooth migration to the C++0x implementation whenever it materializes. That said, it's a nontrivial amount of work to do this right.

By designing the interface to LLVM (i.e. the entire IR and anything it needs) to be the >minimal< set of things that will let you use the full functionality of LLVM in a platform independent manner. Even if this means subclassing some of the IR interface classes to hide the implementation details.

I understand that, but there are also pragmatic concerns such as how to efficiently implement light-weight classes like AliasSetTracker which absolutely need certain methods to be inlined for decent performance. In a world where compilers like G++ dominate the world, using better encapsulation is really not an option. That said, gratuitously exposing internal structure is obviously bad. I keep firm pressure on our developers to reduce the amount of #includage in headers as much as possible.

I'm assuming, of course, that you want to design LLVM as a library/toolkit for compiler writers. It certainly seems to me that being a library/toolkit is part of LLVM's mission.

Of course this is a goal, without it, world domination isn't possible! :)

When designing a library interface, very much care must be taken in what you expose to the library user.

Absolutely. FWIW, this is why we separated out the public interface (llvm/include) from the implementations and the implementation headers. That said, it's quite possible that there are places that could be improved more. :) For example, we have a keyword "code-cleanup" for bugs that are related to improving the API.

Perhaps what we need to do is have a really stripped down version of the config.h file

The only solution I've seen is to not expose config.h and I believe that is the RightThing(tm) to do, even if it isn't the easy thing.

Ok, ok! I believe you! :) The only question is: how?

If you take care of moving the header files for me (you probably want to move the CVS ,v files to retain history),

Will do. I'll try to do it soon after the 1.1 release (which is coming out RSN). FWIW, I usually copy the ,v files, then cvs rm the old location. I think this is the preferred way to "move" files in CVS, do you concur?

then I probably can do it as one big diff. What I won't be able to do is commit to a branch for everyone else to use separate from other work that's on going. For this kind of change, not having CVS access is a pretty big drawback.

I agree, in the meantime we can figure out what to do about CVS access. Worse case, you can produce a giant diff, then one of the local people can commit it to a branch. Better yet would be to get you access to CVS...

Au contraire. You've already shown that you do know somethings about this area. Perhaps its just that you don't want to know? Hmmm??? :->

Hrm, I know a bit about C++ and program design, but know very little about the specific auto* tools. In the next comment I'll attach some proposed solutions to this bug.

-Chris

llvmbot commented 20 years ago

The reason the #include's exist is to get the definition of a symbol. Many of the current files in Config/foo will simply turn one compile-time error into another. This is one area where C++ is very different than C (requires prototypes). This problem should be fixed by doing stuff like FileUtilities.h, where we abstract out the system interfaces we want...

Yes, Yes, Yes!!!!! :)

llvmbot commented 20 years ago

Follow Up To Brian:

Agreed on your point 1
Agreed on your point 2
Sorry for introducing the automake discussion here. For me its all part of the larger issue of making LLVM easily usable. I'll confine further comments to bug 106 which deals with the automake issue.

As an aside, the inclusion (or not) of hash_map in the standard has been a long, long flamewar in the C++ standardization community, but for reasons which I do not fully understand. Nevertheless, hash_map is a very widely-known and widely-used nonstandard extension; it would be a mistake to speak of it as if it were some kind of bogus addon from Mars.

My comment was not directed at the hash_map itself but at the mechanism by which LLVM disambiguates the multiple implementations of it (y'know, the nice thing about standards is there's so many to choose from :). The mechanism is what I was suggesting be sent to the standards groups. However, the use of the mechanism in the LLVM header files forces the LLVM user into one model. This is already a conflict for me. LLVM's configure script selects the GNU version of hash_table. I have code that uses the SGI extension version directly. Since they are based on the same implementation, I get conflicts when trying to use one of my hash_maps at the same time as an LLVM header.

Follow Up To Chris:

You are absolutely right, of course. Take hash_map as an example. include/llvm/Analysis/AliasSetTracker.h uses hash_map for example as an implementation detail (ie, private member) in one of its classes. However, without using a pimpl or something, I don't know how we can hide this from clients. C++ just wasn't designed for this form of data hiding.

There are two design choices that support this kind of hiding:

Forward declarations of internal types
Inheritance and/or Polymorphism.

The former has the side effect that you can only define a private member as a pointer to the forward declared type. This implies a memory allocation which in many instances is undesireable.

The latter avoids the memory allocation but requires that the user invoke some kind of special allocation function to instantiate the actual (subclass) type. LLVM already supports this approach well for Passes. I suggest it use the technique on a more broad scale.

hash_map is not our invention. It predates standard C++, and they chose not to standardize it in the initial C++ standard due to lack of time.
C++0x is supposed to contain standardized hashed containers, for example, see: http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1456.html

Please, I've been using hash_map since the early 1990s. I'm well versed with it. My comment (which I now see was not stated clearly) was directed at the mechanism LLVM uses to disambiguate the various hash_map implementations. See my comment above to Brian.

The problem is that hash tables are a tremendously useful container and often can improve performance quite a bit: but they aren't yet standardized. That's what the Support/hash_map header is trying to work around. With it, clients can use hash_map's without needing to know a lot of the vagarities of different implementations.

Agreed, but that's exactly the problem. My system is a source base larger than LLVM and it uses hash_map (SGI version) extensively. The fact that Support/hash_map disambiguates the implementation is nifty but it also produces a conflict for me.

I understand exactly what you are saying, and I certainly understand what you mean by it being a bad thing, but I'm not sure how to fix it.

By designing the interface to LLVM (i.e. the entire IR and anything it needs) to be the >minimal< set of things that will let you use the full functionality of LLVM in a platform independent manner. Even if this means subclassing some of the IR interface classes to hide the implementation details.

I'm assuming, of course, that you want to design LLVM as a library/toolkit for compiler writers. It certainly seems to me that being a library/toolkit is part of LLVM's mission. When designing a library interface, very much care must be taken in what you expose to the library user. This protects both the user and the developer of the library. I think this discussion makes the benefits for the user quite clear. What you may not realize is the significant benefit to the developer of the library. A clear, concise, minimal interface provides the developer with many degrees of freedom in the implementation. Sh!t happens and code needs to get redesigned to support new functionality. With a good interface design, the library user never knows this. For the most part, LLVM follows such good design principles (you have lots of implementation classes that the user just never even knows about). We're only talking here about the few things that fell through the cracks :)

Perhaps what we need to do is have a really stripped down version of the config.h file that has just the necessities used by DataTypes.h and hashmap/set, and have those prefixed with LLVMAC?

I've scavenged the 'net all day trying to find something that describes how to do this. The only prevailing wisdom I get is to not expose config.h to your users. I've looked at about 5 standard libraries (Xerces, Xalan, ICU, gd, etc.) and none of them expose their config.h. Using the LLVMAC prefix, while simple, I think just obfuscates the issue further. People expect to look at config.h and just see HAVE. Furthermore tools such as autoheader (which generates config.h.in) simply won't (without modification) accept the prefix. The only solution I've seen is to not expose config.h and I believe that is the RightThing(tm) to do, even if it isn't the easy thing.

Another related problem is that the modern way to use public #includes is to place them all in a directory specifically for your project. So that

include <Support/Statistic.h>

becomes

include <llvm/Support/Statistic.h>

This can certainly be done. File a bug and I'll take care of it at some point, probably before 1.2.

Consider it done.

While we can probably get you CVS access, I'm not sure how that would be different than developing things locally, then pushing out a big diff.

If you take care of moving the header files for me (you probably want to move the CVS ,v files to retain history), then I probably can do it as one big diff. What I won't be able to do is commit to a branch for everyone else to use separate from other work that's on going. For this kind of change, not having CVS access is a pretty big drawback.

In any case, the other guys probably have beter/more constructive feedback than I do. Remember again that I don't really know anything about this stuff! :)

Au contraire. You've already shown that you do know somethings about this area. Perhaps its just that you don't want to know? Hmmm??? :->

lattner commented 20 years ago

Speaking personally, I don't understand why we need to use the Config/foo headers instead of just using #ifdef HAVE_FOO; #include ; #endif, because most of them contain exactly that. I would support efforts to replace uses of the former with the latter.

This is of course almost completely pointless. The reason the #include's exist is to get the definition of a symbol. Many of the current files in Config/foo will simply turn one compile-time error into another. This is one area where C++ is very different than C (requires prototypes). This problem should be fixed by doing stuff like FileUtilities.h, where we abstract out the system interfaces we want...

3) Looking back at the summary for this bug, I don't see how this bug should have anything to do with the use or disuse of automake. That issue absolutely has to undergo its own cost-benefit analysis. We should be careful not to confuse the issues...

Agreed.

-Chris

lattner commented 20 years ago

It appears that we only include our config.h header in Support/{DataTypes.h|hash_map|hash_set|slist|iterator}.

Yes, but are these files and the things they declare REALLY part of the public interface you want to expose to LLVM users? I wouldn't think so, they are implementation details. Correct me if I'm wrong.

You are absolutely right, of course. Take hash_map as an example. include/llvm/Analysis/AliasSetTracker.h uses hash_map for example as an implementation detail (ie, private member) in one of its classes. However, without using a pimpl or something, I don't know how we can hide this from clients. C++ just wasn't designed for this form of data hiding.

That said, we can't be the only project in the world that has to deal with the mysterious transmogrification of hash_map across compiler versions...

Actually, its the first time I've seen it. While it might be a great utility, it should be submitted to the C++ compiler folks for inclusion in some future "standard".

hash_map is not our invention. It predates standard C++, and they chose not to standardize it in the initial C++ standard due to lack of time. C++0x is supposed to contain standardized hashed containers, for example, see: http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1456.html

The problem is that hash tables are a tremendously useful container and often can improve performance quite a bit: but they aren't yet standardized. That's what the Support/hash_map header is trying to work around. With it, clients can use hash_map's without needing to know a lot of the vagarities of different implementations.

By exposing these header files in your public API, you're essentially forcing users of LLVM to "do it your way" regardless of the needs of the users of the LLVM.

I understand exactly what you are saying, and I certainly understand what you mean by it being a bad thing, but I'm not sure how to fix it. Perhaps what we need to do is have a really stripped down version of the config.h file that has just the necessities used by DataTypes.h and hashmap/set, and have those prefixed with LLVMAC?

Another related problem is that the modern way to use public #includes is to place them all in a directory specifically for your project. So that

include <Support/Statistic.h>

becomes

include <llvm/Support/Statistic.h>

This can certainly be done. File a bug and I'll take care of it at some point, probably before 1.2.

As much as I'd like to, I can't go that route. I don't have CVS access other than read-only anonymous. I can't create a branch or a tag or check in or anything else. :(

While we can probably get you CVS access, I'm not sure how that would be different than developing things locally, then pushing out a big diff.

In any case, the other guys probably have beter/more constructive feedback than I do. Remember again that I don't really know anything about this stuff! :)

-Chris

llvmbot commented 20 years ago

It seems like we have multiple issues crammed into this bug.

1) I, for one, agree that we should move our remaining separate includes (e.g., include/Support/, include/Config/) into a single llvm/ directory.

Speaking personally, I don't understand why we need to use the Config/foo headers instead of just using #ifdef HAVE_FOO; #include ; #endif, because most of them contain exactly that. I would support efforts to replace uses of the former with the latter.

2) As for things which currently expose HAVE_* macros, remember that any autoconf test which currently outputs a definition of HAVE_SOMETHING can be rewritten to output whatever definition would be in effect if HAVE_SOMETHING were to be defined. There is nothing that says autoconf tests have to output conditional compilation directives, other than the force of tradition -- it's just a Small Matter of Programming, for appropriate values of "Small".

3) Looking back at the summary for this bug, I don't see how this bug should have anything to do with the use or disuse of automake. That issue absolutely has to undergo its own cost-benefit analysis. We should be careful not to confuse the issues...

That said, whether or not an outside developer will find LLVM's Makefiles strange enough to consider them an impediment to adopting LLVM is strictly a matter of perspective. For example, LLVM's Makefiles are different from automake makefiles, but they are not very different from those you find in a typical BSD build system.

As an aside, the inclusion (or not) of hash_map in the standard has been a long, long flamewar in the C++ standardization community, but for reasons which I do not fully understand. Nevertheless, hash_map is a very widely-known and widely-used nonstandard extension; it would be a mistake to speak of it as if it were some kind of bogus addon from Mars.

llvmbot commented 20 years ago

Ok, do you have any other ideas for how to implement this? How do other projects cope with conflicting autoconf configurations?

Other projects don't create the conflicts in the first place. Its generally not "kosher" to make one's public header files depend on the contents of your config.h file. That is reserved for the internal headers and source code files. By making one's public headers depend on config.h a project is basically saying to the users of the package either: (a) we will take charge of all machine configuration stuff (very unfriendly for other library developers) or (b) the project's interface is configuration dependent.

It appears that we only include our config.h header in Support/{DataTypes.h|hash_map|hash_set|slist|iterator}.

Yes, but are these files and the things they declare REALLY part of the public interface you want to expose to LLVM users? I wouldn't think so, they are implementation details. Correct me if I'm wrong.

I'm not sure how to handle these files other than through autoconf.

The use of autoconf (and autoheader) is not the problem. The way the #includes are structured is. Another related problem is that the modern way to use public

includes is to place them all in a directory specifically for your project. So that

include <Support/Statistic.h>

becomes

include <llvm/Support/Statistic.h>

This completely removes conflicts with some package named "Support" as long as there isn't some other package named "llvm". In general, resolving whole package naming conflicts is MUCH easier than resolving them at the file by file level.

LLVM gets this mostly right. The include directory has an "llvm" directory that correctly hides all the header files. Unfortunately, it also has the "Support" directory which should really be a sub-directory of LLVM and possibly named "internal". However, it also contains stuff that isn't really internal (like Statistic.h).

That said, we can't be the only project in the world that has to deal with the mysterious transmogrification of hash_map across compiler versions...

Actually, its the first time I've seen it. While it might be a great utility, it should be submitted to the C++ compiler folks for inclusion in some future "standard". By exposing these header files in your public API, you're essentially forcing users of LLVM to "do it your way" regardless of the needs of the users of the LLVM.

If there is a good reason to change something (like better integration with other source-bases), and it doesn't effect functionality, I have no problem changing it.

None of what I'm suggesting will affect functionality but it will (and should) affect the public API to LLVM (by reducing its scope). I think the driving "good reason" for it is to foster adoption of LLVM in more projects. I may be one of the early users of LLVM in a large way but I won't be the last! If we can solve these problems then other users of LLVM will have a smooth ride integrating LLVM into their projects. And, as a side effect, LLVM core developers won't be hassled with the same integration issues coming up over and over again.

We just have to do things as incrementally as possible and make use of CVS branches for other major changes that cannot be incrementalized.

As much as I'd like to, I can't go that route. I don't have CVS access other than read-only anonymous. I can't create a branch or a tag or check in or anything else. :(

For example, a lot of changes to the makefile can be made to regularize them for automake, before switching over to automake.

This is already done in my build environment. It can almost co-exist with the existing makefile system. But, I haven't gotten all the way through a complete build (including running tests) yet.

This is how the GCC project was ultimately switched over to autoconf IIRC...

LLVM is early enough in its lifespan that hopefully we can avoid such a scenario.

I think there are two basic parts to the needed changes:

(1) Add automake support (change configure.ac and add Makefile.am files). This is mostly done in my build environment. I'll eventually submit the patch under its correct bug number (bug 106). (2) Remove non-essential header files from the public API, especially config.h This is the much more challenging task as it requires some reorganization of the header files, not just changing their contents. Since I don't have CVS access, I'm somewhat hesitant to take this on.

lattner commented 20 years ago

I agree, but its not looking possible. Too many things depend on the HAVE_ prefix and don't have a way to specify an alternate.

Ok, do you have any other ideas for how to implement this? How do other projects cope with conflicting autoconf configurations? It appears that we only include our config.h header in Support/{DataTypes.h|hash_map|hash_set|slist|iterator}. I'm not sure how to handle these files other than through autoconf. That said, we can't be the only project in the world that has to deal with the mysterious transmogrification of hash_map across compiler versions...

I generally support these kinds of changes, but we don't want to be changing a lot of stuff gratuitously. There has to be a clear reason for it... The reason is so that LLVM can be used in a meaningful way!

If there is a good reason to change something (like better integration with other source-bases), and it doesn't effect functionality, I have no problem changing it.

My point was that this is a large incremental step. You can't do just some of the makefiles, they all have to be done. You can't do the makefile changes.

We just have to do things as incrementally as possible and make use of CVS branches for other major changes that cannot be incrementalized. For example, a lot of changes to the makefile can be made to regularize them for automake, before switching over to automake. This is how the GCC project was ultimately switched over to autoconf IIRC...

-Chris

llvmbot commented 20 years ago

I personally don't care if the autoconf variables are prefixed with LLVM (or better yet, LLVMAC), so if you can figure out how to do that, that would be a nice non-intrusive change.

I agree, but its not looking possible. Too many things depend on the HAVE_ prefix and don't have a way to specify an alternate.

I generally support these kinds of changes, but we don't want to be changing a lot of stuff gratuitously. There has to be a clear reason for it...

The reason is so that LLVM can be used in a meaningful way! If all I was doing was creating a language on top of LLVM, I probably wouldn't run into the issues that I do. But, I'm actually trying to embedd LLVM into a larger project and there are numerous issues. The namespaces were one, that's done. The make and config system is another. LLVM is currently a "one off" system. That is, it doesn't follow the accepted practice of 90% of other open source Unix/C++ packages. That means if you're going to incorporate it into something larger that you have to deal with it specially (i.e. actually understand it!)

I think that the only feasible way to do this kind of stuff is as incremental as possible, checking at each step that everything works. Otherwise it will be impossible to untangle which change caused which problem.

My point was that this is a large incremental step. You can't do just some of the makefiles, they all have to be done. You can't do the makefile changes without the autoconf changes. That is, to get everything working right, there's new makefiles and several changes to autoconf.ac. If the config.h issues can be cleanly split out, I'll make that change separately but there rest of it is just a large incremental change.

lattner commented 20 years ago

So, my approach now is simply to find a way to name these things more appropriately so that, for example, HAVE_ becomes LLVMHAVE.

I personally don't care if the autoconf variables are prefixed with LLVM (or better yet, LLVMAC), so if you can figure out how to do that, that would be a nice non-intrusive change.

The namespace change was a step in the right direction but the remaining issues have to do with using autoconf in a standard way, using automake, getting non-public things out of header files, and generally making LLVM a little more self contained.

I generally support these kinds of changes, but we don't want to be changing a lot of stuff gratuitously. There has to be a clear reason for it...

It would be a big patch but I wouldn't want to make these kind of changes incrementally wthout making sure everything works.

I think that the only feasible way to do this kind of stuff is as incremental as possible, checking at each step that everything works. Otherwise it will be impossible to untangle which change caused which problem.

-Chris

llvmbot commented 20 years ago

At first I thought it was going to be as simple as just making sure all the .cpp file included some internal header file that included config.h. But, after looking into it, I discovered that the public header files are dependent on lots of the HAVE* symbols because they define different data structures, etc. So, my approach now is simply to find a way to name these things more appropriately so that, for example, HAVE becomes LLVMHAVE. This way the symbols can be used in public header files without name collision .. best of both worlds. I'm currently researching if autoconf/autoheader can support this automagically. If they can, it will be easy, if not I might think of another solution.

On a side note, I'm thinking about just fixing all the issues having to do with using LLVM in another library package in one shot. The namespace change was a step in the right direction but the remaining issues have to do with using autoconf in a standard way, using automake, getting non-public things out of header files, and generally making LLVM a little more self contained. It would be a big patch but I wouldn't want to make these kind of changes incrementally wthout making sure everything works.

Thoughts?

lattner commented 20 years ago

I know so very little about autconf it is embarassing, so I'm adding John & Brian, who know more about this.

However, I strongly support any effort which gets config.h out of public header files. However, I'm not sure how to do this with DataTypes.h. This file is specifically looking for things like BIG_ENDIAN/LITTLE_ENDIAN and friends, which seem like they have to be autoconf'd. doesn't include any of this information does it? What do you plan to do to fix this?

-Chris