Closed Yamakaky closed 8 years ago
I have but "dumb" arrays cause a stack overflow in the constructor since it attempts to allocate 2MB on the stack. Maybe I could have used a Box([u8, ...]), I don't remember why I opted for Vec. Maybe just because it was more convenient, although of course I'm potentially paying the cost of the bound check for every access, although LLVM should be capable to remove it if it inlines it since I already validate it in memory/mod.rs
, I haven't checked if it does however.
If you can think of a nicest way to do this I'm all hears!
Mmh, actually it probably can't remove it unless it understands that the Vec has a fixed size at runtime. So yeah that's a bit crap.
I created a static-array branch to demonstrate the issue.
Ram is easy :
const RAM_SIZE: usize = 2 * 1024 * 1024;
pub struct Ram {
/// RAM buffer
data: [u8; RAM_SIZE],
}
pub fn new() -> Ram {
// Default RAM contents are garbage
let data = [0xca; RAM_SIZE];
Ram { data: data }
}
As it's inlined in the struct, you should use a Box<Ram>
, or data: Box<[u8; RAM_SIZE]>
. With the second solution, it's transparent since Index calls are forwarded to the content of the Box .
But then I have a pointless indirection anyway (although Box<[u8; RAM_SIZE]>
would be strictly superior because of the statically optimizable bound check, I agree).
Everytime I access the RAM I always have an indirection through the Box pointer. In C I would just allocate the entire emulation context in a single malloc and the compiler would statically know where everything is in relative to the main context pointer. I haven't found a simple way to do that in Rust yet.
Again, maybe it's not a big deal in practice and the CPU manages to hide the dereference but it annoys me from a conceptual standpoint.
If you open the pull to use a Box<[u8; RAM_SIZE]>
instead of a Vec
I'll merge it, there's still the deref but at least it should save us the bound checking.
You mean malloc Ram + Bios + ... in a single blob ?
Everything up to the Cpu (which contains all the other "modules").
Maybe I could achieve it by moving the interconnect/CDrom/GPU constructions recursively inside the CPU constructor instead of constructing them first and then moving them into the Cpu. That would force me to expose a lot of details about the implementation of each sub-module in the Cpu though, but maybe it's not too bad.
Honestly I didn't think too much about it, performance is not my main concern at the moment.
Actually I remember why I used a Vec instead of a Box originally: I was writing this (now seriously outdated) guide and thought that a Vec would be easier to understand than a Box: https://github.com/simias/psx-guide/
But that's a silly reason, and since then I gave up on trying to match the guide with the code exactly, it's just too complicated.
Box
It still overflows when I simply change let mut cpu = Cpu::new(inter)
by let mut cpu = Box::new(Cpu::new(inter))
in main()
.
I think it's simply because the temporary inter
is created on the task before being moved inside the Cpu
in the constructor.
I guess I could try creating the Inter
inside the Cpu
constructor instead, that might solve the issue (assuming that everything is directly allocated in the box and doesn't even transit through the stack).
(btw, I find [u8; RAM_SIZE] to be more understandable than Vec
[u8; RAM_SIZE]
is more readable than Vec
but I was considering Box<[u8; RAM_SIZE]>
and that looks a bit ugly IMO.
Can you give me a proof of concept of what you're talking about? In my test if I Box the Cpu I still observe a stack overflow when I start, so I assume that this means that the inter is actually not created directly inside the Cpu but rather that it's moved from the stack into the Cpu in the constructor. Or maybe I'm missing something else...
Heh, I'm not sure what's the point you were trying to make but when I click run I get:
thread '<main>' has overflowed its stack
playpen: application terminated abnormally with signal 4 (Illegal instruction)
So I've attempted to build the Inter directly in the Cpu constructor (and therefore the Ram struct indirectly) and box the Cpu but I still end up with a stack overflow. You can see the code in the static-array branch.
I think the many layers of constructors cause the optimizer to give up and not attempt the box return value optimization.
Did you try to compile in release mode ? With the link above, it works. It's not perfect, though, since debug mode overflows.
Well with my modifications rustation also overflowed in release mode. It's a bit sucky to rely on a non-garanteed optimization...
I hope the placement RFC once implemented will let us write that correctly and without ambiguity/relying on the optimizer to do the right thing: https://github.com/rust-lang/rfcs/blob/master/text/0809-box-and-in-for-stdlib.md
As the ram and bios have fixed size, did you consider using arrays instead of Vec ?
You could use
into_boxed_slice
, or use directly a [u8] for reading the file.