Open clinuxrulz opened 8 years ago
Another technique would be to assemble the assembly as a Monoid
, not a Monad
. Finishing with just a list of side effect thunks that take no parameters.
I believe using a Monad
for your assembly is killing your performance. As it would continiously allocate memory while traversing over the binds.
To put it another way. Using PureScript to write a runtime transpiler from assembly to JavaScript.
Thanks for the suggestions!
If I understand you correctly:
Applicative Functor
at least.Thanks!
For Suggestion 2. I believe you can compile your assembly into just Array (Eff (ma :: MemAccess, canvas :: Canvas, ref :: Ref | e) Unit)
which is a Monoid
, because if you squint your eyes at assembly language you can see a concatenative language where each assembly language instruction takes on the type MachineState -> IO MachineState
. This happens to make the machine state linearly type, so you can just use side effects to update the machine state in place, thus the Unit
above.
Example Type: addXToA :: Int -> YourAsmMonoid
where YourAsmMonoid
is whatever Monoid
type you`ve developed to store your effects.
I'm really interested in this sort of approach to solving problems generally, when performance is a concern. I've been thinking of applying similar ideas to parsers combinators. I'll be interested to hear how it turns out.
Actually the Monoid
might be more like S -> Array (Eff (ma :: MemAccess, canvas :: Canvas, ref :: Ref | e) Unit)
where S
is a product of either STRef
s or Ref
s pointing to values that can be mutated at runtime. Then once its all compiled, you can pass a single S
and retrieve a single array of effects that will not allocate memory during execution.
I've sort of got the beginning of a free monoid with a specific f-algebra:
Z80Monoid
: https://github.com/clinuxrulz/purescript-gbemu/blob/monoid-effects/src/Z80Monoid.purs
Its gonna take me a long time to get to the end of it all.
Using it will allow us to try out different monoid-like implementations and compare their performance.
Note that even in the original implementation that used a common Z80State -> Eff (ma :: MemAccess | e) Z80State
for instructions in OpcodeMap.purs
that Z80State -> Eff (ma :: MemAccess | e) Z80State
itself is also a Monoid
where append
is kleisli composion and mempty
is simply pure
. So it might be possible to compare performance against the original implementation also (taking care with possible stack overflows).
I still have a feeling that none of the effectful return values need to be used at all in purescript code. Leading to just Z80State -> Eff (ma :: MemAccess | e) Unit
in the end. And looking at Z80Monoid.purs
, each comment that says -- covered already
is showing the amount of code reuse from the other monoid effects above it.
It is also possible to create a sub-assembly with extra registers and primitive instructions than the Z80 provides, then use it to create Z80 instructions from more primitive instructions for further code reuse. Doing so will cost a little bit in performance, yet it would still avoid memory allocation during final execution of the generated effect.
I will need to take care with tagging the final monoid effect list/array/map with memory locations, so that control flow instructions like JR
still work fine. But that can come later.
I must admit, this crosses uncharted territory for me. But it definitely got me curious! I'll read on f-algebras and see if I can understand your implementation! It would be great indeed if this would provide the performance boost this emulator sorely needs.
Its just a Free Monoid
at the moment. I will need do some implementation of the interpreter soon. The Free Monoid
is constructed carefully to avoid stack overflow when compiling the Z80 assembly into effectful thunks.
I will probably aim for something like List (Z80State -> Eff (... | e) Unit)
as the monoid type for the interpreter. Then runZ80Monoid
will construct that list using right associated append
s for a time complexity of O(n)
for construction of n
instructions. Then I start with the initial Z80State
to go from List (Z80State -> Eff (... | e) Unit)
to List (Eff (... | e) Unit))
, and then convert the list into an array, giving us a array of effects that can be executed without allocating memory. Another thing i'll need is to locate the jump instruction destinations and map them to indices into the array of effects to keep control flow constructs working.
Another thing that is interesting is, if I choose a List String
as the Monoid
interpreter type, then we can use the same abstraction to make a ASM.js target, that can be embedded with script injection. The generated code would need to be broken into chunks by jump destinations which would be implemented as a giant while loop with a big switch statement inside to mimic GOTOs in Javascript.
I predict that it may be a bit slower to load the ROM, but it should be much faster to execute the ROM, as the machine code bytes get processed just once and memory allocations are removed from the final execution phase.
It might be a bit miss-leading to search for f-algebras. Idealistically Z80Monoid
is based on a Free Monoid
with a Z80 f-algebra. However in the actual code the Z80 f-algebra has been refactored into a f-coalgebra for convenience.
It might be easier if I just try to explain it.
Z80MonoidImpl
is the f-coalgebra in use, and it could really just be thought of as a Java interface type.
The fs you see all over the place represent the actual type used for the computation.
For Suggestion 1: Something like List String
For Suggestion 2: Something like List (Z80State -> Eff (... | e) Unit)
Z80MonoidImpl
is sort of passed along as you would do for dependency injection.
runZ80Monoid
takes a Z80Monoid
and a Z80MonoidImpl
and smashes all the f
s together.
I really could of used just a big ADT, but I wanted stack safety and O(n) time complexity when smashing n f
s together.
Hope this makes it a little more clear.
Thanks for the detailed explanations!
How you explained Z80MonoidImpl
was how I thought of it, although I've no idea what f-algebra nor f-coalgebra mean (but you got me curious!).
What I'm really missing though is the foresight to anticipate how this changes the generated javascript. I understand the intuition behind not using a monad to avoid allocations, and can see how a using a monoid sidesteps this issue, but in practice when the monadic computations don'[t use the intermediary values is memory still allocated?
I like the idea of spending more time during startup to preprocess the assembly into calculations to get faster execution!
@talw it is true that some uses of monadic binds will not allocate memory (even when using the return result from an action), but it can be very difficult to do so. Your OpcodeMap.purs
in basicOps
and extOps
is a good start, and can potentially pre-calcuate the effects and just look up the correct effect to perform when reaching a certain machine code bytes. I will re-use those tables for Z80Monoid
.
However I notice problems starting to arise in Regs.purs
.
For example have a look at setA
:
setA :: forall e. I8 -> Regs -> Eff (ma :: MemAccess | e) Regs
setA 1
will allocate a new Regs -> Eff (ma :: MemAccess | e) Regs
setA 2
will allocate a new Regs -> Eff (ma :: MemAccess | e) Regs
and so on.
Its even problematic with this type signature:
setA :: forall e. I8 -> Eff (ma :: MemAccess | e) Unit
setA 1
will allocate a new Eff (ma :: MemAccess | e) Unit
setA 2
will allocate a new Eff (ma :: MemAccess | e) Unit
You could make a table of precalculated Eff
for all 256 values for each register, but its a bit of a pain. And you might be able to create a sub-assembly to work around this problem too, while still being monadic.
By the time you've done this, all your code for processing the instructions will be like Eff (ma :: MemAccess | e) Unit
, and you would end up using your Monad
s in a Monoid
style to avoid memory allocations. So I find it easier to just use Monoid
s to start with. At least with Monoid
s you can guarantee your avoiding memory allocations.
In other languages with more aggressive optimizations like Haskell the story is a little different. But PureScript is a young language and it is focused on producing readable JavaScript. That said though, who knows what optimisations the V8 JavaScript engine ends up performing too.
Looking at your existing code, there are memory allocations happening nearly everywhere in the execution phase of the ROM. It seems like every method in Ops.purs
will do so.
There is going to be a fair amount of work to do. A full Z80MonoidImpl
object will need to be implemented using FFI.
It is painful, but I can not see another way.
First of all, thank you very much for the detailed explanations. I'm currently saddled with other things relatively to how I was when I was working on the emulator! And that's why my response can be delayed. But it should be temporary!
I understand completely what you mean by memory allocations now. At first I thought this has something to do with the monadic bind's signature, but generally, the problem is that effectful computations can't be reused because functions that return them are reinvoked all the time. So instead we want to have those computations unhidden behind function arguments. And moving to such a scheme will mean we chain them just like Monoids. Am I correct?
I'm not sure I understand what you mean by sub-assembly. I'm guessing you're referring to a lower level of representing the effectful computations, but how will this solve the memory allocation problem?
@talw Yes you are correct.
Sub-assembly is not really important at the moment. Its just trying to leverage more code re-use. Like creating extra registers like TEMP1, TEMP2, each of the math operators only for the TEMPS, and operators to load registers into TEMPS, and TEMPS into registers, but it can be painful too.
A little off topic. We could design a JavaScript monad to generate efficient JavaScript code for Suggestion 1 (javascript injection).
http://try.purescript.org/?gist=6b61f690baca8c8046b36a1424eda17c
(See method example
and runJS
)
So yes, its not the type-signature of the monadic bind itself that is the problem. I think its just how Eff code gets generated that causes the excessive memory allocation.
That is, use PureScript as a JavaScript code generator.
When loading a rom, get purescript to generate mutable JavaScript code as a string value, then just do JavaScript injection (
<script src=
, noteval
) in the hope V8 will still optimise it.Not sure if it is a good idea, but I'd like to have a go if I get some free time.