llvm-mos / llvm-mos-sdk

SDK for developing with the llvm-mos compiler
https://www.llvm-mos.org
Other
266 stars 53 forks source link

Suggestion: Keep some things in RAM when c-in-prg-ram is used #219

Closed cogwheel closed 10 months ago

cogwheel commented 11 months ago

In order to a) maximize the RAM available to ordinary front-end code and b) facilitate PRG-RAM banking, I propose keeping the following data in the .ram section when c-in-prg-ram is used:

asiekierka commented 11 months ago

I believe that, when PRG-RAM is bankable, the default C section should be in system RAM only - this matches behaviour of only placing read-only C sections in the fixed ROM bank by default when PRG-ROM is bankable.

When PRG-RAM is not bankable, I believe the correct solution is to implement --enable-non-contiguous-regions in LLVM, matching the GNU linker - this will allow any spillover from PRG-RAM to go to system RAM (or the other way around).

cogwheel commented 11 months ago

Fair enough. In the meantime, changing these to use .ram instead of .noinit would get 80% of a) with, I imagine, very much less than 20% of the effort >.>

mysterymath commented 11 months ago

I think I tend to side with @asiekierka on this one; we're generally endeavoring to present C as a "normal" enclave within whatever target system we find ourselves on. That generally means a contiguous text, rodata, data, and bss, followed by empty space for a heap, followed by a stack that grows down. In the case of ROM, it might be natural to split text/rodata from data and bss, and the zeropage adds another complication. In any case, that's generally what one would naturally assume a C compiler does with its library code going in, unless there was a very specific reason to do otherwise (e.g., reset vectors, etc.).

Accordingly, if a user has asked the mutable parts of the C enclave to be placed in PRG-RAM, one would generally expect the whole C enclave to move over, not just parts of it. It's always possible to co-opt this to move specific C code to other regions; this can be done with section annotations or pragmas in user code/data. Linker scripts also provide a mechanism for this; it should be possible to use them to place sections from specific object files or libraries into other banks, even those provided with the SDK. This is only partially possible today though, due to an incompatibility with Link Time Optimization, which we rely upon heavily. Work is underway upstream to resolve this (and I'm poking my finger into it occasionally); it's something actively under R&D for other embedded targets.

cogwheel commented 11 months ago

Accordingly, if a user has asked the mutable parts of the C enclave to be placed in PRG-RAM, one would generally expect the whole C enclave to move over, not just parts of it.

I understand this in general, and had it in mind when I made this suggestion. I'm not quite sure how this applies to the particular items I mentioned.

  1. PAL_BUF, OAM_BUF, and VRAM_BUF aren't part of the C enclave; they're all defined in assembly. PAL_BUF isn't even a proper array, it's just an address of $100. VRAM_BUF and OAM_BUF are currently placed in .noinit which semantically removes them from the C enclave, since all global C variables are supposed to be initialized to 0 (ignoring for the moment that neslib zeros all of system RAM on startup...).

    These are implementation details of neslib/nesdoug, not "user space" variables. My PR to address these moves their definitions to C, but they aren't declared in any headers. I don't think it would make sense for a user to have an a priori expectation about the specific memory region these particular variables are in.

    Now as a user, if I declare my own VRAM buffer, it will be placed in the same "enclave" as all my other variables unadorned with a section. And if I want to customize the placement of the neslib/nesdoug buffers, the barrier to do so is no different. So I'm not clear on what assumptions are being broken by making the neslib buffers stay "out of the way" of user code as much as possible (which I imagine was the original philosophy behind overlapping PAL_BUF with the hardware stack in the first place).

  2. I can see how the stack is a bit closer to being part of the C enclave, but I still think a lot of the same points apply. Semantically, the operation of the stack is entirely opaque to the user and is a consequence of the memory model, calling conventions, etc. of the underlying platform. The fact that the stack can be completely elided unless you use vararg or recursion is obviously a very important thing around here. :)

    As with the neslib/nesdoug buffers, I'm not sure it makes sense for the user to have a specific expectation on where the stack should be. If a user is writing code that depends on the stack to be in a particular location, they already have to do the research to find out how/where the stack is stored for the particular platform. It doesn't seem like it would make much a difference if they found an answer of "at $7FF" versus "at $7FF or $7FFF depending on your config". Either way they would be able to use the __stack symbol to abstract that away, and then they're diving well below the hood of the C enclave.

It's always possible to co-opt this to move specific C code to other regions; this can be done with section annotations or pragmas in user code/data. Linker scripts also provide a mechanism for this; it should be possible to use them to place sections from specific object files or libraries into other banks, even those provided with the SDK. This is only partially possible today though, due to an incompatibility with Link Time Optimization, which we rely upon heavily. Work is underway upstream to resolve this (and I'm poking my finger into it occasionally); it's something actively under R&D for other embedded targets.

I guess I'm still a bit confused by this elaboration. Moving specific C data to other regions using section annotations is all I'm suggesting for the buffers.

mysterymath commented 11 months ago
  1. PAL_BUF, OAM_BUF, and VRAM_BUF aren't part of the C enclave; they're all defined in assembly. PAL_BUF isn't even a proper array, it's just an address of $100. VRAM_BUF and OAM_BUF are currently placed in .noinit which semantically removes them from the C enclave, since all global C variables are supposed to be initialized to 0 (ignoring for the moment that neslib zeros all of system RAM on startup...).

We've extended the C enclave with a notion of .noinit; we actually got this from AVR GCC, which formalizes the same. And neslib/nesdoug are broadly C libraries to provide a more structured way to interact with the NES; they follow the C ABI, communicate through the C imaginary registers, etc. C in this usage isn't so much a language, it's a runtime model. Maybe I should say "llvm-mos 6502 ELF ABI" or something instead.

Now as a user, if I declare my own VRAM buffer, it will be placed in the same "enclave" as all my other variables unadorned with a section. And if I want to customize the placement of the neslib/nesdoug buffers, the barrier to do so is no different. So I'm not clear on what assumptions are being broken by making the neslib buffers stay "out of the way" of user code as much as possible (which I imagine was the original philosophy behind overlapping PAL_BUF with the hardware stack in the first place).

We'd more or less discussed the overlapping PAL_BUF as being a mistake; we're walking that back, right? With these buffers weak, users can place them wherever they like; the question is one of defaults. The registered intent for "c-in-prg-ram" is "place basically everything in PRG-RAM", C here meaning "everything that is written against the C ABI".

It's always possible to co-opt this to move specific C code to other regions; this can be done with section annotations or pragmas in user code/data. Linker scripts also provide a mechanism for this; it should be possible to use them to place sections from specific object files or libraries into other banks, even those provided with the SDK. This is only partially possible today though, due to an incompatibility with Link Time Optimization, which we rely upon heavily. Work is underway upstream to resolve this (and I'm poking my finger into it occasionally); it's something actively under R&D for other embedded targets.

I guess I'm still a bit confused by this elaboration. Moving specific C data to other regions using section annotations is all I'm suggesting for the buffers.

I guess that's my point; we're kind of jumping the gun by doing this on behalf of the user. By default, everything is placed in NES RAM. We provide a linker script to allow users to express, "Eh, actually, put C in PRG-RAM instead." To my ears, we're essentially arguing over the definition of "C": I'd say it's everything that isn't pinned down (i.e., everything written against the C ABI should be placed into "standard" sections, and those sections should be moved), but I'm not entirely sure where you're drawing the line. It needs to mean something well-defined to say "c-in-prg-ram"; if a user has to look in detail at our linker scripts to see what actually is and isn't moved by this, then that's a failure of abstraction. As far as possible, a user should be able to guess the behaviors of the knobs we expose; the degree to which that isn't possible is the degree to which the abstraction leaks.

cogwheel commented 11 months ago

To my ears, we're essentially arguing over the definition of "C":

Regarding .noinit, sure. I accept that is being treated as an extension of C here.

the question is one of defaults.

I think this is really the crux of it for me. As a user going from c-in-ram to c-in-prg-ram, I was initially surprised that placing c-in-prg-ram made the whole ram section unavailable. With that fixed, my next lament was that 1.5 pages of of my newly claimed RAM were taken up by the 2nd/3rd-party OAM and VRAM buffers. It doesn't really provide me with any utility that they are in the same region as my other variables. It comes across purely as a cost with an opt-in solution.

The ultimate solution is the one @asiekierka mentioned... When PRG-RAM is fixed, treat all 2+8 KiB as a single, non-contiguous memory region, and when PRG-RAM is banked, put everything in RAM.

(i.e., everything written against the C ABI should be placed into "standard" sections, and those sections should be moved),

I agree, but I don't see the PAL_BUF (even in it's revised form), OAM_BUF, and VRAM_BUF arrays as being written against the C ABI. Aside from a couple stub-ish functions to trigger linking of the assembly objects, all the code that operates on these buffers is written in assembly, so the C ABI is only relevant at the function entrypoint.

As far as possible, a user should be able to guess the behaviors of the knobs we expose; the degree to which that isn't possible is the degree to which the abstraction leaks.

And I'm trying to show a counterexample where the a user guessed wrong. I expected all of the 8 K of ram to become available for my user code. I was surprised when it brought a bunch of seemingly unrelated things along for the ride.

In a case where reasonable expectations can go in opposite directions, I would lean towards the approach that leaves the most resources in the user's hand.

To sum up my current feelings:

mysterymath commented 11 months ago

This is requesting a policy change or a policy exception on the sections used by library code in the SDK; presently all libraries place their contents into the C sections by default. There are a couple specific buffers that you're surprised are placed in the C sections; that's fair, and I acknowledge the surprise. In response to this, we could make exceptions for these buffers to alleviate your surprise, but without further information, that would be the recorded reason why these buffers are this way. If we add new variables to the NES SDK, we'd need to check in with you to see where to place them, since we'd have no other criteria to use to judge which section they should be placed into. This isn't tenable.

That's specifically why I'm in this conversation, to try to extract a new policy that both decreases your surprise and provides a person-independent test we can use to decide where to place things in the SDK. We can't evaluate whether that policy would be better than the current policy without actually naming it. So, what, specifically, is it about these buffers that suggests their removal from the C sections, as opposed to all of the other code and data already placed there by libc, crt0, neslib, and nesdoug?

EDIT: Lemme offer a guess to illustrate the kind of thing I'm looking for: it's that they take up a relatively large percentage of the available RAM on the target?

cogwheel commented 11 months ago

In response to this, we could make exceptions for these buffers to alleviate your surprise, but without further information, that would be the recorded reason why these buffers are this way. If we add new variables to the NES SDK, we'd need to check in with you to see where to place them, since we'd have no other criteria to use to judge which section they should be placed into. This isn't tenable.

Exactly :) We can guess what users will or won't be surprised by, and we'll be the ones left surprised.

So, what, specifically, is it about these buffers that suggests their removal from the C sections, as opposed to all of the other code and data already placed there by libc, crt0, neslib, and nesdoug?

EDIT: Lemme offer a guess to illustrate the kind of thing I'm looking for: it's that they take up a relatively large percentage of the available RAM on the target?

The neslib/nesdoug buffers are the ones I'm intimately familiar with. I am also interested in accounting for maybe other system-level data I'm unaware of?

I think the simplest conceptual line to draw would be "all runtime code and libraries provided by the SDK use RAM or ZP for their variables by default". It is a clear division and puts the most resources into the hands of the user, without overriding symbols, sections, banks, etc.

However, this would require effort to implement proper data, bss, and noinit sections for the RAM region, along with changes to the code and/or build scripts of the stdlib to use the new sections. That would be a nontrivial amount of special-case code that would just have to be removed/reworked when a unified RAM model is implemented.

So optimizing "bang for buck", my next pick might be "Anything currently explicitly marked as .noinit in the SDK should be changed to .ram". IMO, explicitly declaring something noinit suggests an object is "relatively large". It seems like it would make a good heuristic here.

cogwheel commented 11 months ago

So optimizing "bang for buck", my next pick might be "Anything currently explicitly marked as .noinit in the SDK should be changed to .ram". IMO, explicitly declaring something noinit suggests an object is "relatively large". It seems like it would make a good heuristic here.

It looks like the only other noinit data relevant to NES target are:

The only other SDK variables in .noinit are the serial buffer from eater (256 bytes) and stack_start from cpm65 (2048 bytes), fitting with the heuristic (but otherwise irrelevant to the discussion).

mysterymath commented 11 months ago

I think the simplest conceptual line to draw would be "all runtime code and libraries provided by the SDK use RAM or ZP for their variables by default". It is a clear division and puts the most resources into the hands of the user, without overriding symbols, sections, banks, etc.

However, this would require effort to implement proper data, bss, and noinit sections for the RAM region, along with changes to the code and/or build scripts of the stdlib to use the new sections. That would be a nontrivial amount of special-case code that would just have to be removed/reworked when a unified RAM model is implemented.

I had originally opened llvm-mos/llvm-mos#220 to do something like this, but now having fleshed this out in this discussion, I don't think it was ever a good idea. Accordingly, I've closed that issue.

Every C compiler I'm aware of places the results of compilation into sections with fixed, ABI-defined names; this is as true of "system library" code as it is for "user code", and compilers ranging from modern production compilers to tiny embedded ones, including cc65, all work this way. I'm not aware of any strong prior art for a compiler making this kind of distinction; typically the same set of sections are used for both. It's down to linker scripts to place those same set of sections into different locations, on a per-file basis. In an ideal world, I'd think that's how it would be done here too: you could use a custom linker script to place neslib, libc, or whatever's wherever you like.

That doesn't work today due to LTO, so we're left with making some symbols weak so they can be manually placed elsewhere by the user. But that's more-or-less a concession; linker scripts are generally the preferred way in the C world to lay out abstract code into memory. That's more or less what a linker's for. To hard code this into the library is to make an unwarranted guess about the user's intent, and this would make it more difficult to manipulate the library's sections in a linker script once we're able to properly express that.

Accordingly, I'm closing this one out too. We can still provide users an easy way to move the C enclave around or to override placement of certain parts of it, but I'm not hearing the overwhelming argument needed to break from convention this strongly.

cogwheel commented 11 months ago

I feel like you are responding to points that I'm not actually making. What you quoted is not my proposal (hence the "however"). You asked me to go very general and lay out what distinction made sense to me. I gave "everything from the SDK should be in RAM" as an example of the "if there were no costs to consider" option, but that was not the conclusion of my comment.

The actual proposal I settled on boils down to: change the word '.noinit' to '.ram' for the already-explicitly-sectioned variables in system-specific libraries

This is not a grand compiler-changing, convention-breaking revolution. I explicitly avoided making suggestions for sweeping changes like that, which is why I'm confused by your responses. I don't see what convention is being broken by a system specific library (e.g. neslib) making system-specific decisions for the benefit of their users.

"I have more RAM available without any extra effort" is a much more tangible benefit than "all of the writable data in my program is in the same region".

Every C compiler I'm aware of places the results of compilation into sections with fixed, ABI-defined names

Well, none of the neslib/nesdoug/famitone buffers are currently "the results of compilation" by a C compiler, at least until #222 . Either way, they already have explicitly specified, custom section names beyond just the default ".noinit". To me, they have already opted into more than just the normal section management of the C compiler.

As you've noted, neslib/nesdoug/famitone are included in the sdk for convenience of getting people on board with the existing tutorials. If we imagine similar libraries maintained by 3rd parties, would it really be so strange for them, knowing the memory layout llvm-mos uses and the memory layout of the NES, to place their large uninitialized buffers in .ram section for the convenience of their users?

To hard code this into the library [...] would make it more difficult to manipulate the library's sections in a linker script once we're able to properly express that.

I'm all for reducing the need for future work but wouldn't this be as simple as changing any explicit ".ram" declarations back to ".noinit"?

Accordingly, I'm closing this one out too. We can still provide users an easy way to move the C enclave around or to override placement of certain parts of it, but I'm not hearing the overwhelming argument needed to break from convention this strongly.

If the discussion is over then I'll move along. I'm just sad that the conclusion of this discussion seems to be built on red herrings rather than the actual suggestions I'm making.

mysterymath commented 11 months ago

The discussion has closed because its tone has changed sufficiently to warrant locking it, and I have to issue some kind of judgement in that case, so I'd tend towards the status quo. I don't pretend to be impartial here; compiler-ey folk tend towards conservatism to a fault. There may be more ground here to cover, but increasingly emotional language has appeared in both your and my writing, and it's very unlikely that continuing in this fashion will be fruitful in the immediate future. If anything material changes or this becomes more important, we can pick this up at a later date.

mysterymath commented 11 months ago

Apologies, I should clarify. I shouldn't have used the word lock. I'm not locking this issue or anything; I'm just intending to bow out of it. Conversation can continue here, and if a broader consensus develops here that this is desirable, I'll revisit. But, for now, this is definitely Not Planned.

cogwheel commented 11 months ago

There may be more ground here to cover, but increasingly emotional language has appeared in both your and my writing, and it's very unlikely that continuing in this fashion will be fruitful in the immediate future.

I've been around the internet long enough that I can't blame you for that expectation. A lot of conversations like this just end up going in circles, and emotions continue to snowball. But personally I don't see that as inevitable here. I see a fairly limited surface area to the actual topics that I'm still confused/unclear about, though that might not be apparent from my verbosity -_-

I'm engaging in this conversation as much to learn about the principles and goals that go into the various decisions as anything. Understanding exactly why the suggestion is being rejected would help me look at different/future problems through a lens that better matches the goals of the project. I want to propose/implement ideas that please the compiler-ey folk.

mysterymath commented 11 months ago

I've been around the internet long enough that I can't blame you for that expectation. A lot of conversations like this just end up going in circles, and emotions continue to snowball. But personally I don't see that as inevitable here. I see a fairly limited surface area to the actual topics that I'm still confused/unclear about, though that might not be apparent from my verbosity -_-

I'll try to provide a summary of my mental model of the problem, maybe that will help.

In a library, placing code in .text is not a declaration that the code must be put in precisely the same location as the other C functions in the link. It's to say "I don't care where this is placed, so long as it's somewhere suitable for code.". It's the default; if you don't have a strong reason to put code elsewhere, you put it in .text.

This convention serves a purpose: it allows the author of a linker script to treat your library abstractly. Tools can more easily analyze the contents of the library and report on its size. They can write a linker script that takes the things that you don't care about and places them somewhere specific, say, to make them fit. They can do this because you declared that you don't care precisely where they get put, by putting them in .text.

That's what strikes me as off about this proposal, this is an extremely hard statement by neslib that its buffers MUST be put in the ram memory region. There's no if's or but's about it, that's where they have to go. Sure, the symbol is weak, but that just allows you to provide your own buffer. But if you want to use the one in neslib, by golly, it's gotta go in RAM. The express purpose of the .ram section is to place things in the ram memory region; it has no other purpose. But, I've heard no neslib-related argument why these buffers intrinsically MUST go in RAM. It seems like it should be up to the user where they get placed, and conventionally, for C runtimes, that means placing them in .bss, or for us, .noinit.

Accordingly, another way to go here would be to separate these out into their own TUs, e.g., pal-buf.o. Then a linker script could place them wherever by moving the .init section of pal-buf.o, by name. Importantly, the symbol doesn't even need to be weak for this to work; this is generally the conventional way to do this.

That's also the model for what c-in-prg-ram means: take everything in the link that isn't pinned down, and move it to PRG-RAM. That means the contents of .bss, .noinit, and .data; those are the things that libraries didn't find a reason to pin down. Accordingly, what I've been trying to fish out is if there's some neslib-specific reason to pin these sections down. But, it sounds rather like it's just inconvenient to have c-in-prg-ram mean that.

cogwheel commented 11 months ago

That really helped tie together a lot of the things I didn't understand from earlier comments. Thank you.

I think part of where my mental model went off track was the other thread about the neslib buffers where weak symbols were suggested. I didn't understand it was just a workaround for limitations of linker scripts (or why the linker scripts were relevant when you brought it up here).

That's what strikes me as off about this proposal, this is an extremely hard statement by neslib that its buffers MUST be put in the ram memory region. There's no if's or but's about it, that's where they have to go. Sure, the symbol is weak, but that just allows you to provide your own buffer. But if you want to use the one in neslib, by golly, it's gotta go in RAM. The express purpose of the .ram section is to place things in the ram memory region; it has no other purpose. But, I've heard no neslib-related argument why these buffers intrinsically MUST go in RAM. It seems like it should be up to the user where they get placed, and conventionally, for C runtimes, that means placing them in .bss, or for us, .noinit.

Ok, so the semantics of .ram are entirely physical but the concept I'm trying to embody, "put yourself in whatever region leaves the most available space for user code/data" is abstract. In my mind, this is similar to the HIMEM features in DOS. That's why I used the name ".low_ram" in #216. From my POV it's not so much that the library is declaring exactly which part of ram it must be in, it's declaring that it wants to be out of the way of user space.

But I see that without generalized .data and .bss, a unified RAM model, or LTO fixes, implementing these semantics would necessarily be opinionated to an extent that isn't present in the existing architecture.

I think the only place where my opinion truly differs is on the target audience and set of principles applied for each component. To me there are three tiers:

  1. The compiler, standard library, and common functionality for all targets
  2. Target-specific functionality that is common for most or all applications
  3. Application-specific functionality

As far as I can tell, we completely agree about Tier 1. There is little more sacred in software development than our trust in compilers and their runtimes. Breaking that trust should be prevented at all costs. The target audience is literally everyone.

I don't think we're far off re: Tier 2 either. The decisions made here need to be rigorous and robust, much like Tier 1, but I might make some less idealistic choices here. E.g. given some set of limitations present in Tier 1, maybe there would be some workarounds presented here. The target audience for these workarounds is either everyone when the workarounds are transparent, or intermediate users who are willing to dive into linker scripts and such.

For Tier 3, I have a completely different set of standards. My target audience here are:

They get mad when they run out of resources. They get mad when they find out months later that more resources were available and they could've been using them the whole time. They get mad at the huge leap in knowledge required to get from "I compiled a project with my own C++" to "I have added a linker script to move the VRAM buffer into the system RAM region".

So for my own projects, I not only consider appropriate but sometimes mandatory to make opinionated decisions in Tier 3 in order to provide the best out-of-the-box experience for this group of users. Anyone who knows enough to care about the convention being broken is certainly skilled enough to figure out how to adapt it to their needs. Bonus points for clear documentation comments wherever the opinionated decisions are expressed.

In the case of neslib, nesdoug, and famitone, I would place them solidly in Tier 3. I see them as dependent on, rather than components of the platform code. I would consider it appropriate to find some kind of workaround that is tightly coupled to the current state of Tiers 1 & 2 in order to get closer to the desired, ideal outcome of having both RAM regions available transparently.

Is there some other approach besides the ones eliminated/postponed so far that is as quick but maybe not as dirty as replacing .noinit with .ram? Maybe there's a name we can use for a section that embodies "the part of ram that is not usually available to programs" in a way that is not platform-specific?

mysterymath commented 11 months ago

I think providing an excellent end to end pipeline for NES development to serve your "tier 3 users" is ultimately out of scope for the SDK. neslib and nesdoug were provided for convenience and as a proof of concept; I debated on whether to include them at all, and it may actually have been a mistake. They're rather opinionated, and broadly, we've tried to keep opinionated code out of the SDK as far as possible. The main reason they're there is that we don't have anything resembling ca65 support at the moment, and that would make the NES targets extremely bare-metal, and I considered that too much of a risk to adoption.

Ideally, someone would build nice, beginner friendly content pipelines on top of the SDK, but with the resources available, I don't think the SDK can or every should be that. There's just too many platforms we support, too many ways of working, and the scope is too broad for this motley crue to support it well. CC65 suffers from this scope creep quite a bit; many of its libraries are destitute and barely functional.

Accordingly, I value simplicity, consistency, and breadth of capability much much more than user-friendliness. Admittedly, this is the attitude that tends to make modern toolchains a bit rough to use, but its the only approach I know of that scales well.

johnwbyrd commented 11 months ago

They're rather opinionated, and broadly, we've tried to keep opinionated code out of the SDK as far as possible.

I wonder if it would be reasonable to include a non-default linker script that demonstrates @cogwheel 's method, or otherwise document in the current linker script how it might be modified to follow @cogwheel 's proposal. Failing that, I'd love to see a page on llvm-mos.org describing these different approaches for the NES.

cogwheel commented 11 months ago

I think providing an excellent end to end pipeline for NES development to serve your "tier 3 users" is ultimately out of scope for the SDK. [...] I debated on whether to include them at all, and it may actually have been a mistake.

There's just too many platforms we support, too many ways of working, and the scope is too broad for this motley crue to support it well. [...] CC65 suffers from this scope creep quite a bit; many of its libraries are destitute and barely functional.

I wonder if it's about forking time then... Moving these libs to a separate project (or 3) would allow them to be more opinionated, reduce the surface area of core llvm-mos maintenance, and side-step any licensing worries from their inclusion.

mysterymath commented 10 months ago

As surprised as I am to say this, I do think we're going to end up doing this one. It came up in #229 that all of these buffers have high alignment, which risks wasting a ton of space on the NES relative to available RAM. That's a good enough intrinsic reason to reserve a page-aligned place to put high-alignment buffers on the NES. So, I'm making an .aligned noinit section, and I have to decide whether to allocate it to c_writeable or to ram. The notion of a place for high-alignment noinit buffers definitely isn't something that belongs to the C runtime, and NES code very conventionally places these things at 0x200 (according to @jroweboy ). So, I'm going to place this in ram, which actually will complete this issue.