mn416 / QPULib

Language and compiler for the Raspberry Pi GPU
Other
430 stars 63 forks source link

Make debug output dynamic #25

Closed wimrijnders closed 6 years ago

wimrijnders commented 6 years ago

This centralizes all library code between #ifdef DEBUG blocks.

A new class Debug is added to the Library. This allows for debug output enabling in the code, without having to recompile the library.

The debug output can be turned on and off with a call to Debug::enable(). It is also possible to redirect debug output to a logfile.

In particular, it gets rid of #ifdef DEBUG statements in the library code. The way is now open to remove DEBUG=1 as a make option, which currently doesn't do anything any more.

Code has been added to example program Tri to show usage.

wimrijnders commented 6 years ago

This is the best I could do without changing the code too much. The changes were a little more elaborate than I thought.

Even though this is a code change, it is related to the makefile. I want to get rid of the current debug build, simplifying the make process. The ability to change the output at runtime, and also output to file, I consider a bonus.

As an afterthought, this could be considered logging. So perhaps a Log class is a better idea. The functionality could be extended so that it can replace the printf's as they occur in the code. An alternative is to use an external library for logging.

wimrijnders commented 6 years ago

As an aside, the Debug class has some doxygen-style header comments. This should give you some idea as to how the commenting will be when (if?) I add doc-generation.

wimrijnders commented 6 years ago

TBH the more I think about this the uglier I find it. I think I have to rethink what I want to do here.

The idea is good, namely to be able to specify the target of the debug output and to be able to turn it on and off. But this implementation is off-putting.

At a minimum, I want to be able to extract the source and target code output as it happens in compileKernel() without having to resort to a recompile with DEBUG=1. Perhaps you have an idea how to do that.

wimrijnders commented 6 years ago

NOTE: Please do not accept & merge this yet (if you intend to); see #29 first.

mn416 commented 6 years ago

At a minimum, I want to be able to extract the source and target code output as it happens in compileKernel() without having to resort to a recompile with DEBUG=1. Perhaps you have an idea how to do that.

You could make compileKernel() take an optional parameter that, to begin with, just includes logging options, e.g. whether logging is enabled and if so where should the log be written to.

wimrijnders commented 6 years ago

Yes, that would work. But I have realized it would be better to refactor a bit first for more flexibility, see #26.

In addition, I would prefer a more general approach, that can be used in other code parts as well, not just compileKernel().

wimrijnders commented 6 years ago

Actually, your suggestion is fine as a first step. But the thing is that compileKernel() gets called in the constructor of struct Kernel, so you would need to pass in there as well.

That may not be too much of an issue. It just appears a little roundabout to me. I would prefer a method in Kernel that creates the output by request, but I don't see a way to do that right now.

mn416 commented 6 years ago

It's only a suggestion, feel free to explore other solutions, I don't have any strong opinions on this.

wimrijnders commented 6 years ago

Hokay, will think on this.

Heh! Our first conflicts to resolve! :laughing:

wimrijnders commented 6 years ago

@mn416 The best I can think of right now, is to define a separate call for printing output, e.g.:

template <typename... ts>
void pretty(void (*f)(ts... params), const char *filename = nullptr);

This would make the output generation explicit and by request, so the #if DEBUG can be removed in the kernel code.

If no filename specified, the output goes to stdout. Internally, this calls compile() with a file pointer as extra parameter. The drawback is that a kernel is compiled and then discarded, just to generate the output.

This fits right in with your suggestion above. Tell me what you think about this.

mn416 commented 6 years ago

The Kernel class has access to both the source code and target code inside member variables. So why not just add a pretty() method to the Kernel class? It could take optional parameters regarding the destination of the print. Maybe call it printCode() instead of pretty().

wimrijnders commented 6 years ago

Hey you're awake. I had this feeling you're in a completely different timezone from me.

Because the kernel doesn't store the source code in any way. It's discarded in compileKernel() as soon as the target code has been assembled.

EDIT: Wait a sec, let me just check what you said just there...No, sure about it, source code is not retained.

Edit 2: No wait, I'm an idiot. You're right, Kernel is a struct and it has members sourceCode and targetCode. Excuse my brainfart.

mn416 commented 6 years ago

Ok, I see

wimrijnders commented 6 years ago

OK, your suggestion will work and is better. Will go with that.

wimrijnders commented 6 years ago

Ah, now I remember. This is what I meant (ctor Kernel):

  // Save pointer to source program for interpreter
    #ifdef EMULATION_MODE
    sourceCode = body;
    #else
    sourceCode = NULL;
    #endif

This is just a matter of removing the #ifdef. I hope that's OK with you.

wimrijnders commented 6 years ago

Ooh, I just noticed that there's a single global stack stmtStack. This is going to be problematic if multiple kernels are compiled in a program.

I propose to add a stack member to struct Kernel, so that there can be multiple separate kernel instances. I actually feel that this is kind of urgent.

EDIT: On closer inspection, it's not as bad as I thought. Only the top item of the stack is used within a kernel instance (am I right??). The argument to have a stmtStack per instance would then be to protect from concurrency issues.

It could actually be a local variable as well within the constructor of Kernel. It's used only there, there is no further relevant state to speak of.

mn416 commented 6 years ago

Multiple kernels should be fine and, as you've spotted, the compile() function is not thread-safe.

But the stack needs to be a global variable, because it's being used to construct the abstract syntax tree. All the QPU code that the user writes is being shovelled into the abstract syntax tree behind the scenes.

This is pretty fundamental to QPULib. It might be called a hack, but it's exactly what makes the approach work in the first place.

wimrijnders commented 6 years ago

OK so far. I will study that part in more detail to see if I can make more sense of it.

For now, a message somewhere saying that 'QPULib is not thread-safe' will suffice.

Let me nuance that: specifically, compile() is not thread-safe. The actual usage of compiled kernels should be no problem with multiple threads, should there every be a reason to use them as such.

There may be consequences I can not oversee right now wrt threads, but I won't worry about them right now. Deal with them as thtey come.

mn416 commented 6 years ago

For now, a message somewhere saying that 'QPULib is not thread-safe' will suffice.

The compile() function is not thread-safe, but in theory, once created, you could call kernels from different threads.

wimrijnders commented 6 years ago

Understood. Please excuse my paranoia (which for programming I regard as a good trait to have).

wimrijnders commented 6 years ago

@mn416 I have the feeling that a 'usage notes` section in the docs would be a good idea. Somewhat akin to your 'usage strategies' proposal but with more practical notes (e.g. gotcha's). I can think of 3 things already to put into it. The note CPU/GPU memory split in 'Getting Started' would fit in there as well.

wimrijnders commented 6 years ago

I'm quite happy with the Kernel::pretty() approach. Will let it rest till tomorrow so I can scrutinize it with a clear mind before committing.

wimrijnders commented 6 years ago

@mn416 This is fit for review again. Notes:

mn416 commented 6 years ago

To avoid breaking multiple kernels in QPU mode, I would replace the code

    #ifndef EMULATION_MODE
    astHeap.clear();
    #endif

in Kernel.h with:

    // We can clear the AST heap if we're sure the source program is not being
    // used any more. However, to implement Kernel::pretty(), we keep the
    // source program so we better not clear the heap.
    // astHeap.clear();

Since we're no longer freeing memory after creating a kernel, it's possible that a program with multiple kernels may run out of AST heap space. But that could be solved (if we really wanted to) by making the heap automatically increase it's size when required.

It might be worth creating an example that involves creation of two kernels, and testing it. I'm not sure I've ever tried it before, but it should work.

To clarify, the astHeap is where the AST is allocated. Looking back, I may have had some reason why I preferred to have my own heap structure, rather than just using new and delete, but I don't remember it at the moment. Maybe I was paranoid about memory leaks.

wimrijnders commented 6 years ago

Maybe I was paranoid about memory leaks.

:smile: I have no problems with paranoia wrt coding. I regard it as a good thing. Embrace it, brother!

It might be worth creating an example that involves creation of two kernels, and testing it. I'm not sure I've ever tried it before, but it should work.

I smell a unit test here. I think Rot3D is a good candidate here, since it has multiple versions.

What I'm reading between the lines is the question of how much the heap can be stressed, ie. how many kernels can actually be compiled before things go awry. This is actually a test worthy of consideration, if only to see what happens at the end.

Let's see if I can make such a test work. This is the part where I turn evil.....


EDIT: Come to think of it, AutoTest would be even better for this, since it generates heaps of code (no pun intended), meaningless or otherwise. Much faster way of stressing out the heap.

wimrijnders commented 6 years ago

astHeap.clear(): ok, I understand the issue here; there is no way to unallocate elements on the heap, thereby inhibiting reuse of heap capacity. So anything allocated will just remain there.

Glaring at the heap code right now, which is currently simple in essence. Right now, I feel that any attempt to reclaim heap memory is like opening a can of worms.....

mn416 commented 6 years ago

Right now, I feel that any attempt to reclaim heap memory is like opening a can of worms.....

I agree, trying to do the garbage collection manually could introduce all sorts of subtle bugs. I don't feel the issue will bother users very often, and when it does they have a simple solution.

If I had to allow multiple ASTs, any of which can be freed at any time, then I'd probably reallocate astHeap for each kernel, i.e. do astHeap = new Heap inside the Kernel constructor before elaborating the kernel function. Now every kernel has its own astHeap which could be freed in the destructor, or a member function.

But probably best just to leave it as is, with all calls to astHeap.clear() commented out.

wimrijnders commented 6 years ago

OK. I hope you can agree that something needs to be done for making the heaps more efficient to use. However, I suggest to defer the work for that now. I will do the following: