Closed oshaboy closed 1 year ago
Hi there! Ok I never heard about the chip8 before now, and I have to say that I am quite intrigued! Have checked out a simple C emulator, tested out an assembler - and I think that this could all easily be added to TRSE (even the emulator) - but that will be the last step.
The compiler in TRSE is quite massive, and the code generator is hierarchical meaning that there is a general abstract class that the various cpu implementations inherit from (Z80, 6502 etc). Since I'm the only one coding on TRSE, it shouldn't take me more than a couple of days to get a compiler up and running + an emulator + assembler. I'm probably going to add this now anyway (since I got that intrigued), but if you're interested in helping out / participating with TRSE development etc, here's an invite link to our slack channel: https://join.slack.com/t/turborascal/shared_invite/zt-1pemsfysk-e8GErjsOHVfPiI4FhuDuGw
and just to explain (if you're interested): The structure of the TRSE compiler is as follows (from top to bottom):
Lexer : reads text and generates streams of tokens Parser : consumes tokens according to the syntax of Pascal and builds an Abstract Syntax Tree (AST) AbstractCodeGen: contains high-level code-generation routines. Visits the AST to generate asm code in a text file. codegen_z80, codegen_6502, codegen_x86 etc are sub-classes and performs the actual code output Assemblers: Either the internal OrgAsm, or external (like nasm, lwasm, rgbasm) will convert the .asm text file to binary Post-processing: .bin file might need a header for cartridges, or disk system. here the binary file will be processed so it can be run in an emulator according to the specific computer/media type.
https://user-images.githubusercontent.com/5620670/219176308-e3baa574-22b3-4c44-82ae-1f2a135e8077.mov
well I went ahead and did it.. was too tempting. I've added an internal chip-8 emulator into TRSE, and have set up the new cpu/system classes and all. you can now load a chip-8 project, compile and run with ctrl+R and it runs in the internal emulator. Fun stuff!
Mind you, two major things remain: i'm using an external assembler (c8asm), and the code generator doesn't have anything implemented except for asm(" "); blocks, so no Pascal.. yet.. but now the fun part begins =) thanks for suggesting this!
and just to explain (if you're interested): The structure of the TRSE compiler is as follows (from top to bottom):
Lexer : reads text and generates streams of tokens Parser : consumes tokens according to the syntax of Pascal and builds an Abstract Syntax Tree (AST) AbstractCodeGen: contains high-level code-generation routines. Visits the AST to generate asm code in a text file. codegen_z80, codegen_6502, codegen_x86 etc are sub-classes and performs the actual code output Assemblers: Either the internal OrgAsm, or external (like nasm, lwasm, rgbasm) will convert the .asm text file to binary Post-processing: .bin file might need a header for cartridges, or disk system. here the binary file will be processed so it can be run in an emulator according to the specific computer/media type.
Thank you for the explanation but I would like a bit more detailed documentation. I understand now there's no Intermediate Language, but that still doesn't explain how to actually contribute. Like what's the format of the AST? What do the Codegens need to implement? How does OrgAsm work? How do the Systems fit into this? If at least the header files had some comments that explained what everything does and that would make it a lot easier for people to join the project.
well I went ahead and did it.. was too tempting. I've added an internal chip-8 emulator into TRSE, and have set up the new cpu/system classes and all. you can now load a chip-8 project, compile and run with ctrl+R and it runs in the internal emulator. Fun stuff!
Mind you, two major things remain: i'm using an external assembler (c8asm), and the code generator doesn't have anything implemented except for asm(" "); blocks, so no Pascal.. yet.. but now the fun part begins =) thanks for suggesting this!
There are like 17 different assemblers for Chip-8 named c8asm. If you are using wernsey's assembler I should warn you that it is far from feature complete. It didn't even have negative immediates until I added them 2 weeks ago (which are quite necessary because there's no subtract immediate function). It has no macros ("define" statements can only define a single token). And it doesn't support XO-Chip or MegaChip at all. I guess that isn't really a problem for code generators.
Also there are a few compatibility quirks between different versions of Chip-8. Some emulators let you select which "quirks" to emulate. But that means that writing Chip-8 code generator that generates portable code will be difficult. The Chip-8 codegen might even be an abstract class in and of itself. Furthermore because the assembly wasn't defined there are a few competing assembly languages with wildly different mnemonics. I actually thought to add Chip-8 to TRSE because the Octo assembler used a syntax that was very similar to Pascal, but it was just a strange assembly.
OrgAsm is the internal 6502/z80 assembler and is not necessary right now since I've added a chip-8 internal assembler (yes Wernseys, until a potential extension of OrgAsm). All that is needed to do now is to implement the missing methods + assembly syntax in the code generator.
You're the first person in 5 years who have been asking me about the inner workings of the compiler, so I haven't really seen the need to document how the compiler works until now. Could be fun to write a detailed contribution documentation, will try to get something like that up on github this weekend.
A bit more detailed explanation then. Will write this up in the upcoming contributions file.
SourceBuilder is a wrapper class that takes in source files, project files, sets up the compiler, parses, codegens and assembles. It's the stuff that runs the CLI and whenever you press ctrl+R in TRSE.
"Compiler" and its subclasses are specific wrapper classes for different CPU type. Compiler6502 will for instance initialise Asm6502 (not an assembler, but the .asm text file writer). The compiler connects various code blocks, inserts system-specific "init.s" assembly files etc. The compiler calls the Parse method in the Parser, and will also perform the actual assembling afterwards. The compiler class will also create the corresponding code generator objects.
"Assembler" and its subclasses (kind of a misleading name) are assembly-producing helper classes used by the code generator actually write assembly code. Think as->Asm("ld a,b"); as->Comment(...),as->DefineString
Every computer that TRSE supports is as a "System[computer]" class, like SystemC64, SystemThomson, SystemSNES. SystemC64 is subclass of SystemMOS6502, which again inherits from AbstractSystem (parent abstract class). These classes handles things like which assembler to use, which cruncher, every system has a start program address, system strings, emulator strings and parameters, memory maps and post-processing methods (post-processing stuff happens when a .bin has been generated by the assembler, but needs to be further processed such as adding ROM headers or add to some obscure disk format). The system classes do interact with the compiler.
So: no intermediate language, everything is stored in the AST. "Node" is the abstract parent class, with all the various Pascal features : NodeConditional (if a>b.. then), NodeForLoop, NodeBlock (begin / end blocks), NodeAsm, NodeProcedureDecl, NodeVarDecl, NodeRepeatUntil, NodeCompound etc all contain subnodes and / or various information. For instance, the NodeConditional (if a>b) contains a block for a) the binary clause "(a>b and c<d)", the "true" block and an "else" block.
The heart of the code generator is AbstractCodeGen. it uses the visitor pattern to "visit" the AST nodes from top to bottom, starting with the "NodeProgram" and going down from there. The abstract code generator class is abstract, and platform-independent. For instance, AbstractCodeGen has implemented the NodeForLoop visits
void AbstractCodeGen::dispatch(QSharedPointer
but will not produce any code here - just call virtual functions that need to be implemented by the actual cpu-specific implementations. You will see that in order to actually increase a counter and compare it with a number, "CompareAndJumpIfNotEqualAndIncrementCounter (...)" will be called. This is a method that needs to be implemented individually for all the various CPUs.
Another "heart" in TRSE is the AbstractCodeGen::AssignVariable function, which handles all assignment operations "a:=something*b+d()" etc. It basically goes through a list of possible candidates for the RHS, and if it finds something suitable (like "is the RHS a simple integer without indices? then we can optimise, call IsSimpleAssignInteger. This method is then implemented on the cpu-level code generator classes).
Thank you so much.
as I knew nothing about compilers when I started work on TRSE back in 2018, there are unfortunately a lot of .. let's say legacy code. For different CPUs I've been using different methods of implementing things in the codegen, because I tried new things. Some worked, some didn't, and two years ago I got the brilliant idea that I should just make my own intermediate code language (think LLVM) called "TRIPE", so TRSE should just output TRIPE .trasm files and then the TRASM (LLVMish) would optimise and convert the code to the target platform. After a couple of weeks I managed to get a prototype up and running, but soon realised that I'd have to spend at least a year getting things working properly... so I decided to pause the project and focus on user support, and just make the current compiler better.
so yeah I'd really like to rewrite the code generation (the parser and ast tree things are just fine) to make it uniformly better, because right now it's rather messy I'm afraid. I hope I find time during the next couple of months to try this out, shouldn't be that hard - because right now there are some redundancies and stupid things in the code. actually there are a lot of stupid things, but.. it works. but it won't win any beauty contests
I didn't realize TRSE was made by a single person. I saw "16 contributors" on the github page and thought it was a team.
but if you do a new pull, I've added full support for the chip-8 - without codegen, that is - just compile up, load the chip-8 example project and press ctrl+R to build, no external assembler/emulator necessary. Then, if you have the nerve, have a look at "Codegen_chip8" - I basically just copied the JDH8 codegen (experimental youtube stuff) classes, so it's got no proper code generation yet (it will output code for the JDH8)
but you should get an idea how things work with regards to assigning variables, performing binary operations, storing variables, calling procedures etc
hehe yeah it's nice to have a team! but those contributors only write units / games for the tutorials. Sometimes fix a compile errors when I screw up. but yeah, 95% of the code is mine, 5% are borged from other sources (such as obscure disk format creation, the chip8-emulator base etc)
that being said, I really really wouldn't mind having additional people contributing with the hard depths of the compiler code - most people are scared away by the .. complexity. And I really could need a new perspective on how some of the things could be implemented/improved
On a lighter note, let's assume that you wanted to add another hypothetical computer that uses the Z80 (let's call it "Noisy"), and has an emulator that can start programs from command line. This is really easy - you create a new SystemNoisy class that inherits from SystemZ80. You add the system type "Noisy" to AbstractSystem.h etc. Then you add some new emulator params, update the post-processing function and ApplyEmulatorParams method, set the program start address and other system-specific things. Add the class to the FactorySystem factory pattern class, and you suddenly have a new system in TRSE! Create a folder under the tutorials, link to it via the "tutorials.txt" files (contains the list of all TRSE sample projects), and voilá! new system added!
Any progress on either Chip-8 or Internal Docs?
sorry, been bogged down with the 6809 compiler, and a new assembler for the 6809, bugfixing the z80 and trying to make a demo at the same time! Maybe I'll find some time this weekend/next to write some documentation.. everything should be in place for the Chip8, with the exception of the code generator which is still just a copy of the JDH8
Well I could help with the Chip-8 if I understood how the compiler works.
I guess I could write some library routines.
chip8.tru.txt Here's what I came up with, it's a decent start.
I assumed all the arguments are sequential in memory from left to right, I don't know if that's a valid assumption.
ooh thanks a bunch, very nice! I've added your code to a "system.tru" file in the chip8 units folder. Some local bumps in the road here, so I haven't been able to do any TRSE this weekend - but will hopefully have more time later this week
BTW it seems like all my functions end with LD I, result ; LD [I], V0
so I think it will be nice to have the V0 register contain the return value instead of having to write it to memory. Just a thought.
I should probably keep this thread for discussion of Documentation and make another seperate thread for Chip-8 tho.
Ok I had a tad time before bed, and managed to get a couple of simple things added to the chip-8 code generator.. I've also added your unit, but with some minor tweaks so it uses global variables instead. You can now do stuff like this:
Init();
System::Beep(100);
for x:=0 to 100 do
begin
y:=x+10;
System::DrawLine(x,y,$FF);
end;
Ok closing this as we're on slack!
I want to contribute to the project (I was thinking of a CHIP-8 backend) but I have no idea what anything does. What does a codegen need to implement? Do I need to make an assembler? Is OrgAsm an Intermediate Language or the Assembler framework or what? I see the UML diagrams but nothing else. It would be nice to have an idea of what all these Abstract Classes are supposed to do.
Thank you for listening.