orc-lang / orc

Orc programming language implementation
https://orc.csres.utexas.edu/
BSD 3-Clause "New" or "Revised" License
42 stars 3 forks source link

Design and implement an Invoker/Accessor style compile time metadata API #208

Open arthurp opened 7 years ago

arthurp commented 7 years ago

This API would replace the ad-hoc API in orc.values.sites.SiteMetadata with something a bit more principled and powerful. The new API needs to have the following features:

  1. Provide the following information about a method:
    1. How many times it will publish, as a Range.
    2. Whether it has side-effects.
    3. Whether it observes the mutable state and hence is effected by other methods effects. Any site that returns false for this is "pure" in that it will always return the same value given the same arguments.
    4. The maximum delay between the call starting and when it will publish for the first time.
    5. The maximum delay between the call starting and when it will halt.
    6. Can it be "direct called".
    7. Can it be called at compile time.
    8. For the published values, a full set of metadata information.
  2. Provide the following information about object which provide field accessors:
    1. What fields is the object guaranteed to have.
    2. For each field, a full set of metadata information.
    3. Can the object have additional fields beyond those which are specified in the metadata.
  3. Provide the following information about any data value (returned from a method or anything else):
    1. Is the value mutable or immutable? (for distribution purposes mostly)

All values need to have safe conservative values. For instance, it is safe to assume that a site will have side-effects even if it does not. Sites that do not provide this information will use those values by default.

Many of these values will need to have data types defined from them. Some of those (delay, and effect information) need to be designed pretty carefully and have well defined semantics.

Once this API is implemented it should immediately replace the old version. The SiteMetadata API was never documented or officially made public.

arthurp commented 7 years ago

I have been thinking about this a little and I suspect that we should support both detached and attached metadata. The reason is that requiring detached meta-data is a recipe for divergence between the metadata and the implementation if the methods are being co-developed with the Orc program. However, we could make these languages one and the same.

If we define a metadata language independently of the attachment to classes, we can either put it in an annotation on the Java/Scala class, or in a separate metadata file which provides a little extra syntax to provide class names (or another import identifier if we are importing from a Polyglot VM language). We could also allow the metadata language to be specified at the import site in Orc, though I'm not sure this is a good idea since it would make updating the metadata for a new version of the external library more error prone (though if all the imports were in an include file it would actually be pretty reasonable).

arthurp commented 7 years ago

Through out the development of our data data we should think about how we can allow metadata to be changed for specific objects or call sites. This is particularly important for distribution since many Java libraries return mutable data, (like an array) but never change it later. This means that the specific array is in fact immutable as long as the Orc program doesn't change it. Another use case is annotating calls to say they are pure or will publish immediately even if the call target does not guarantee that.

Both of these cases could be handled using a wrapper of some sort which overrides the metadata of the underlying value, but just forwards to it at runtime. For compile-time metadata (like side-effects and delay) these wrappers could probably be eliminated by the compiler before the code runs. But for any metadata that distribution uses the wrapper will need to persist at runtime to enable the distribution engine to examine it. Truffle can probably eliminate most of the runtime overhead (memory overhead and a pointer dereference would be unavoidable), but it will take some care to make sure this is possible.