rgonomic / rgo

R/Go integration
BSD 3-Clause "New" or "Revised" License
52 stars 3 forks source link

proposal: support R attributes #1

Open kortschak opened 4 years ago

kortschak commented 4 years ago

Background

R types use attributes in a number of ways that rgo cannot currently replicate without prior work in R (see the matrix handling example). Go doesn't not have dynamic attributes for values, so it would be helpful for rgo to be able to interface between the approaches taken by the two languages.

Proposal

I propose that the rgo: struct tag be augmented to add a ,attribute (or ,attr) suffix to allow attributes to be transferred between the languages.

In the simplest form an example would be how a matrix is passed between R and Go; in R a matrix is a vector with a dims integer vector attribute. This would be written in Go under the proposal as

type Matrix struct {
    Dims [2]int `rgo:"dims,attribute"`
}

But this does not capture the vector of matrix elements, so we need to special case R vectors with attributes as structs. To be able to do this we can make a rule that a struct with a single with no ,attribute tag and with at least one ,attribute-tagged field will place the R value in the untagged (or marked with another suffix?) field and the attributes tagged fields, so an R matrix would then be

type Matrix struct {
    Data     []float64   `rgo:""`
    Dims     [2]int      `rgo:"dims,attribute"`
    DimNames [2][]string `rgo:"dimnames,attribute"`
}

The correct handling of dimnames in this example depends on presaged changes to type correspondences outlined in 8d0061b9d92c1d98bfc3d5f6212df5a983c25a40 and described in #2, which need to happen anyway to ensure that R's odd approach to nested data structures is considered correctly by Go code.

The following struct would result in an rgo error and refusal to perform the wrapping.

type Matrix struct {
    Name     string      `rgo:"name"`
    Data     []float64   `rgo:""`
    Dims     [2]int      `rgo:"dims,attribute"`
    DimNames [2][]string `rgo:"dimnames,attribute"`
}

Potential impact of proposal

This will fairly significantly increase the complexity of the initial analysis of structs, however, the benefit outweighs this even if only for the ability to more easily use matrices.

kortschak commented 4 years ago

Thinking more about this, I think the rule will have to be a specific rgo: tag for the R value, "-" seems the best. So an attributed value would be a struct with only one `rgo:"-"` tag and at least one `rgo:",attribute"` tag.

For matrices:

type Matrix struct {
    Data     []float64   `rgo:"-"`
    Dims     [2]int      `rgo:"dims,attribute"`
    DimNames [2][]string `rgo:"dimnames,attribute"`
}

A struct with more than one `rgo:"-"` is a build time failure. Whether an absence of ",attribute" tags should be a build time failure I'm not sure, though I lean towards it since otherwise struct { S []A `rgo:"-"` } just becomes and verbose way of writing []A.