Paths in #line and __FILE__ are ambiguous

Sei-Lisa commented 5 years ago

The path reported in #line and in __FILE__ are relative to the directory where they are found during path search. This causes ambiguity. For example:

test.c

__FILE__
This is test.c
#include "header.h"
#include "dir/stuff.h"

header.h

__FILE__
This is the top header.h

dir/stuff.h

__FILE__
This is stuff.h; dir/header.h follows
#include "header.h"

dir/header.h

__FILE__
This is header.h in dir/

The above outputs:

#line 1 "test.c"
"test.c"
This is test.c
#line 1 "header.h"
"header.h"
This is the top header.h
#line 1 "dir/stuff.h"
"dir/stuff.h"
This is stuff.h; dir/header.h follows
#line 1 "header.h"
"header.h"
This is header.h in dir/

Note how #line 1 "header.h" appears twice, making it impossible to discern which one the lines belong to.

mcpp solves this by using absolute paths, both for __FILE__ and for #line. gcc solves this by using paths relative ~~to the main file's directory~~ to the directory where gcc is launched from for __FILE__ and the # markers it uses instead of #line. If the gcc way is desired (which is arguably cleaner), note that os.path.relpath admits a start parameter.

For reference, this is gcc's output:

# 1 "test.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "test.c"
"test.c"
This is test.c
# 1 "header.h" 1
"header.h"
This is the top header.h
# 4 "test.c" 2
# 1 "dir/stuff.h" 1
"dir/stuff.h"
This is stuff.h; dir/header.h follows
# 1 "dir/header.h" 1
"dir/header.h"
This is header.h in dir/
# 3 "dir/stuff.h" 2
# 4 "test.c" 2

Edit: Just compiled the Boost::Wave command line tool. It uses absolute paths like mcpp, but it has a command line option to change #line only (not __FILE__) to relative paths. They are relative to the main file, rather to the current directory like gcc does

Sei-Lisa commented 5 years ago

Actually, gcc's way is more complex. I believe it's as follows: if the file is found while searching an absolute path, then an absolute path is used; otherwise a relative one. This would prevent e.g. "../../../../../../usr/include/stuff.h" for a file located in /usr/include, which wouldn't make sense.

But that complicates things quite a bit; I would be happy with mcpp's way.

ned14 commented 5 years ago

This issue is more complex than it looks. One of the primary use cases for pcpp is generating single header includes for C++ projects which are shipped to end users. Absolute paths in the output are therefore anathema as it ruins their portability. Furthermore, the tooling which calls pcpp may be operating from a user defined build path, so emitting paths based on the working directory would be unstable, and again, could introduce portability problems.

Hence pcpp is intentionally the way it currently is, which is to always use relative paths from the file doing the including.

I recognise that this is not ideal. It is a least worst choice situation. I'd like to improve it, not least because the preprocessed output can confuse debuggers due to the same ambiguity you reported.

But I'll need to ponder the problem at hand first. And I'd like to know exactly what the major compilers do as well.

Sei-Lisa commented 5 years ago

Thanks for the insight. As I noted in my last edit above, Wave #line filenames are relative to the main file that is being preprocessed. I think that mixing this and the absolute+relative strategy that GCC uses (also used by wave for #line when that option is used), and apply it to both #line and __FILE__, could solve both use cases.

Currently, #line is used by my optimizer to report errors, and not being able to distinguish the errored file is being a problem.

ned14 commented 5 years ago

After thinking about this a bit, I decided in the end that emitting paths based on the working directory of pcpp was the least worst. I know this deviates from gcc which tries to base them from the path of the first source file, but pcpp is primarily expected to be used in build tooling often upon build generated artefacts, and with the approach I chose the end user can always set the cwd to that of the first source file if they want the same output as gcc. Whereas if they want more detail in the line directives emitted, this gives it to them.

Anyway give this a spin and tell me if it works for you.

Sei-Lisa commented 5 years ago

Thank you so much! This works perfectly for my purpose. Actually gcc generates the very same paths, at least on my end. It's the command line wave that makes them relative to the file being preprocessed, no matter which directory it's invoked from.

ned14 / pcpp

Paths in #line and FILE are ambiguous #22

ned14 / pcpp

Paths in #line and __FILE__ are ambiguous #22

Paths in #line and FILE are ambiguous #22