habit-lang / alb

39 stars 5 forks source link

External area declarations #13

Open jgbm opened 5 years ago

jgbm commented 5 years ago

Consider adding a new form of "external" area declaration, or some other way to introduce an area in a Habit program whose storage is provided elsewhere. This would provide support for something comparable to the following construct in LC:

external vram  = 0xb8000 :: Ref Screen

(The LC code actually uses the external f=g :: t mechanism to support this.)

fxdpntthm commented 5 years ago

Q: How do I set up a minimal test instance for this? A (1): http://web.cecs.pdx.edu/~mpj/pubs/bytedata.html A (2): https://github.com/zipwith/lc-baremetal

zipwith commented 5 years ago

It looks like the focus here has been on adding an external mechanism that is specific to area declarations. But the mechanism in LC is more general, and I think we'll need something comparable to the LC approach in Habit for the applications that we're starting to develop. Habit and mil-tools currently use primitive to mean different things; that's why the term external is used in LC, and is probably also why we're using that label here. But I think that the right place to add the more general version of this feature in Habit might well be in primitive declarations rather than area declarations. The example at the top of this issue might then be written as:

    primitive vram = 0xb8000 :: Ref Screen

More generally, we might have:

    primitive id = e :: ty

where id is used as an identifier of type ty within the Habit code, but will ultimately be implemented by the expression e. A key point here is that e does not actually need to have type ty (despite what the e :: ty portion of the syntax above might suggest). Instead, the type of e should have the same representation as ty. The vram example in LC works because Ref a values are ultimately represented as Word values, which matches the type of 0xb8000.

But the more general mechanism has also proved to be useful in other ways. For example, we might implement a toBits function for converting references into Word values by using something along the following lines:

   primitive refToBits = (\w -> w) :: Ref a -> Word

Two closing thoughts:

1) the syntax used here is awkward, not just because of the e :: ty issue mentioned previously, but also because we'd need to make sure e is an atomic expression to avoid parsing ambiguities. I'd be very happy to consider better notations. Maybe primitive id {e} :: ty or implement id :: ty using e, or ...

2) The examples above illustrate that this mechanism is inherently unsafe; we are only prepared to use 0xb8000 as a valid reference because the platform designers told us that was ok. But if we mis-type the constant value, for example, nothing will alert us to the error until we actually try to run the code. This is an unsettling feature for a language that values type safety guarantees. But it also seems unavoidable when you consider that what we are really working with here is a "foreign function interface". The only difference is that the "foreign" language is just a variant of the source language that exposes more details about low-level representations.

jgbm commented 5 years ago

I'm afraid I don't entirely follow.

My impression of the external areas was that, other than having a particular location in memory, they would be in all other ways like normal areas---i.e., that they might have initializers (depending on their types), would require ByteSize instances, and so forth. In this sense, assuming the back-end were clever enough not to use the same region of memory for two different things (say, by allowing an external area to live in the same space that it intended to use for the heap) there shouldn't even be any unsoundness. (Of course, if the user specified an impossible address---0, say, or something outside the valid range---that would probably go poorly...)

The refToBits example you've given seems a different thing to me. If our goal is simply to break type safety, we don't need new language features, we simply need to add

primitive typesWhoNeeds'Em :: a -> b

and have the back-end implement that as the identity function [1]. Of course, one could imagine using the latter to achieve the former; why not have

vram :: Ref Screen
vram = typesWhoNeeds'Em 0xb8000

but this seems like a very different thing; in particular, it wouldn't play nicely with any of the other area-specific features. (That said, it still wouldn't let you readRef a Ref (Stored (Int -> Int)), so there's some safety left... but it certainly would let you read garbage out of, say, a Ref (Stored (Ix 14)).) Perhaps it's sufficient for the few cases where we really need areas with specific locations?

[1] One could imagine calling this function something like unsafeCoerce instead, but that seems unfortunately anodyne.

zipwith commented 5 years ago

My comments may have pushed us into a language design discussion. Perhaps an "issue" like this isn't the best place for that? The "most significant bit" of my previous comment was to argue that we will need something more general than just support for external areas alone. I believe that the external construct mentioned in the initial message is a good candidate that also covers our current application for external areas, but I am open to other options.

Of course it is not our goal is to break type safety. But any program that includes either a primitive or an external area declaration could potentially be doing just that. (This includes, for example, anything that imports the prelude, and hence that potentially relies on the primitives defined there.) Each such declaration represents a promise that a named entity, defined elsewhere, will behave in a manner that is consistent with the type that has been declared for it. Bad things can/will happen if even one of those promises is broken. Declaring an external area for video RAM at address 0xB8000 is a reasonable thing to do on a traditional PC because that is the address that the hardware has been configured to use for interfacing to video. Or is it? Maybe I misread the data sheet. Maybe I mistyped the address as 0xB800, or 0xB8000, or 0x68000, or 0xd8000 (or maybe one of those was the correct address and I entered something else by mistake). Maybe the machine I'm running on doesn't actually have any video RAM hardware at any address. It's not just a matter of avoiding special addresses like 0 or the ranges covered by other memory areas: there may only be one valid address that can truly be considered as a value of type Ref Screen, and perhaps not even that. And if anyone writes code using a different address, then (a) there is no way for the compiler to detect the problem, and (b) there is a possibility that the resulting program might now have the ability to read/write from some supposedly protected memory region or to perform unintended memory-mapped I/O, simply by using the "video RAM".

We could have a similar problem with a declaration like primitive primNegate :: Unsigned -> Unsigned. When we write that, we need to trust that the associated implementation will be a pure function of the declared type. (It would be even better if it implements some kind of negation operation on Unsigned values...) But the compiler can't be expected to check this, and if the program is used with an implementation that doesn't satisfy this "contract", then we have no guarantees about what will happen because we have broken a key premise of the proof that "well-typed programs will not go wrong".

On the other hand, there is nothing inherently wrong with defining the coercion function that you've described as a primitive ... so long as you ensure that you will only run programs using that definition in environments where the coerce function is implemented in a way that honors its declared type. I can only think of one way to do that right now—(\x -> undefined)—and that clearly isn't going to be very useful in practice. But at least it is consistent with the declared type.

There is legitimate discomfort about the potential for misusing features like this in ways that could compromise fundamental properties of the language. But I do not know how to accomplish our practical goals without providing some way to access external functions. The external areas construct addresses one specific need, but although it is already dangerous, it does not provide all the functionality that we need.

The external id = e :: ty construct that I described has been implemented in LC and does seem to provide the necessary functionality. However, I am open to discussing other ways to meet those needs in Habit. If there is a better approach, I'll be all in. That said, it is probably worth identifying some of the features of the LC approach that might encourage us to consider it for Habit (or that might be useful in evaluating other options):

Again, I'm happy to explore other options, especially if they share some or all of the features above while also providing additional benefits. I'm less certain if this is an appropriate place to conduct that conversation; maybe we should transition to email?