Open galacticstudios opened 9 years ago
I believe the upper bound on token space required is four times the number of characters. In practice the space required is usually much less than this. Maybe you can find a better way of approximating the space required in advance?
It looks like the worst case is worse than that: A token is one byte, then there's a pointer to the last character position, which could be four bytes (or 8 bytes on a 64-bit system), then a ValueSize of arbitrary length.
My first step will be to allocate the memory on the heap instead of the stack. Then I might use a dynamically growing buffer (or allocate additional buffers as I need them, and merge them in the end).
Sounds good. I'll be keen to merge your improvement.
I've checked in the fix on my fork at https://github.com/galacticstudios/picoc. There are several other fixes there too - and more are coming.
One thing I'm concerned about is that to get 'long's working as 64-bit values on MSVC, I changed picoc so that calculations are done with int64_t and uint64_t. So chars, shorts and ints are always getting promoted to 64 bits. Problem #1 is that this violates the C spec. Problem #2 is that it's slower than 32-bit calculations if it's running on a 32 bit platform (and my ultimate goal is to port this to a 32 bit microcontroller). But implementing correct promotion and expression evaluation rules would probably take more cycles than just doing everything in 64 bits.
On the bright side, all the csmith unit tests now work.
It would be great to be able to configure the short, int, long, etc sizes in picoc. For example it would allow the simulation of a platform where int is 16 bits with picoc compiled with MSVC.
I have the feeling many great fixes and improvements are in the pipe, I guess it is time for picoc to use continuous integration (with travis) and contribution guidelines. I'll open separate github issues for that.
"I have the feeling many great fixes and improvements are in the pipe": That was my intent, but my project directions might be changing. I might have a couple more, but I don't know if there will be any beyond that.
Hi Zik. I've been working to fix bugs in picoc and one of them is an Out of Memory error. It happens in LexTokenise in this code: int ReserveSpace = (int) (Lexer->End - Lexer->Pos) * 4 + 16; void *TokenSpace = HeapAllocStack(pc, ReserveSpace); Lexer->End is pointing to the end of the source file, and Lexer->Pos is at the beginning. So if the source file is large, like some of the unit tests in the test/csmith directory (e.g. rand0.c is 41K), it tries to allocate 4 times the file size on the stack.
Can you explain the thinking behind this? Why allocate 4 times the size of the text we're lexing?
Thanks, Bob Alexander