Structure and format for `airtable` class objects

elipousson commented 1 year ago

This is a bit of a brain dump so I'm hoping it makes sense but here is a quick summary of the existing class structure and potential new directions/alternate approaches.

Background

The new implementation of airtable class objects in the pull request for migrating the package to {httr2} #14 includes four S3 object types:

airtable
airtable_base_schema
airtable_table_schema
airtable_fields_schema

An airtable object is a list with a base ID, table ID or name, user-facing base url, and API request url. A table url is included if table is specified as a table ID (but not as a table name). A base name/description, user permissions level, and table name can optionally be included. The base name (called description for the airtable object to avoid a conflict with the table name value) is pulled from the metadata if description is NULL. The table value is assumed to also be the name name if the supplied table is not a table ID. One or more views can be included (although multiple views will break other functions).

airtable_base_schema is a list that includes airtable_table_schema and airtable_fields_schema as components.

These were all part of the existing development branch but the pull request converted them from environment objects to list objects to use the S3 vectors classes functions available through vctrs.

Possible challenges with existing airtable class objects

Currently, the airtable class objects represent a single table but the values are not validated when the object is created so it is possible to create an airtable object that has a table and view that does not exist within the specified base. Would we want to add validation so this is no longer possible? If so, is an airtable object effectively just a subset of the data included within an airtable_table_schema?

I'd also prefer a clearer hierarchy where there is an object that represents a single base, an object that represents a single table, and an object that represents a single view (potentially using sub-classes, e.g. airtable, airtable_tbl, airtable_view).

Correspondence between API and class structure

I one goal for the class structure of the package should be a clear correspondence between the package class structure and the data model built in to the Airtable Web API.

Right now I think the airtable and airtable_table_schema are both close equivalents to the table model: https://airtable.com/developers/web/api/model/table-model We could add a similar object to serve as an equivalent for a table config object: https://airtable.com/developers/web/api/model/table-config

airtable_fields_schema should exist as (or be convertible to) an array of field config objects: https://airtable.com/developers/web/api/field-model

Ideally, when create_table() is implemented, it could take an airtable_table_schema and convert it into a create an identically structured table. Similarly, a airtable_fields_schema could be use to add a set of fields using create_field() (or some additional function, e.g. create_fields()).

The view metadata endpoint is part of the Enterprise API but it effectively includes an implicit view object model that we could also use as a base for an additional class or sub-class: https://airtable.com/developers/web/api/get-view-metadata

Object type for airtable class objects

While the option to return to an environment object as a base might make it easier to set the active view, I think I prefer the idea of sticking with the infrastructure that vctrs offers for vector/list-style S3 objects.

elipousson commented 1 year ago

As some food for thought on alternate structures for representing Airtable base data in R, I put together a function that converts a base into a dm object (created with the dm package).

library(rairtable)
library(rlang)
library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following objects are masked from 'package:rlang':
#> 
#>     %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
#>     flatten_raw, invoke, splice

options("rairtable.pat_default" = "TEST_AIRTABLE_PAT")

dm_airtable <- function(url = NULL, ..., cell_format = "json") {
  base <- rairtable:::get_base_id(
    url = url, ...
  )

  table_list <- list_base_tables(base = base)

  table_fields <-
    map(
      seq_along(table_list[["name"]]),
      function(i) {
        tbl_fields <- table_list[["fields"]][[i]]

        tbl_options <- tbl_fields[["options"]]

        vctrs::new_data_frame(
          list(
            "name" = tbl_fields[["name"]],
            "id" = tbl_fields[["id"]],
            "is_primaryFieldId" = tbl_fields[["id"]] %in% table_list[["primaryFieldId"]],
            "linkedTableId" = tbl_options[["linkedTableId"]],
            "prefersSingleRecordLink" = tbl_options[["prefersSingleRecordLink"]],
            "inverseLinkFieldId" = tbl_options[["inverseLinkFieldId"]]
          )
        )
      }
    )

  table_fields <- set_names(
    table_fields,
    table_list[["name"]]
  )

  table_pk_cols <-
    map(
      table_fields,
      function(x) {
        x[x[["is_primaryFieldId"]], ][["name"]]
      }
    )

  table_pk_cols <- as.character(table_pk_cols)

  table_records <- map(
    table_list[["id"]],
    function(id) {
      list_records(
        base = base,
        table = id,
        cell_format = cell_format
      )
    }
  )

  id_col <- getOption("rairtable.id_col", "airtable_record_id")

  base_dm <- dm::new_dm(
    tables = set_names(
      table_records,
      table_list[["name"]]
    )
  )

  table_names <- set_names(table_list[["name"]], table_list[["id"]])

  for (i in seq_along(table_names)) {
    tbl_name <- table_names[[i]]
    tbl_pk_col <- table_pk_cols[[i]]

    base_dm <-
      dm::dm_add_pk(
        dm = base_dm,
        table = !!sym(tbl_name),
        columns = !!sym(tbl_pk_col)
      )

    linked_fields <-
      table_fields[[i]][!is.na(table_fields[[i]][["linkedTableId"]]), ]

    if (nrow(linked_fields) > 0) {
      for (r in c(1:nrow(linked_fields))) {
        ref_table <- table_names[[linked_fields[["linkedTableId"]][[r]]]]

        # ref_col <- id_col

        # if (cell_format != "json") {
          # Compare table_names and table_pk_cols to identify the primaryfieldID for the ref_table
          ref_table_fields <-
            table_fields[[ref_table]]

          inverse_link_field <-
            linked_fields[["inverseLinkFieldId"]][[r]]

          if (!any(is.na(inverse_link_field))) {
            ref_col <-
              ref_table_fields[ref_table_fields[["id"]] == inverse_link_field, ][["name"]]
          } else {
            ref_col <- tbl_pk_col
          }
        # }

        base_dm <-
          dm::dm_add_fk(
            dm = base_dm,
            table = !!sym(tbl_name),
            columns = !!sym(linked_fields[["name"]][[r]]),
            ref_table = !!sym(ref_table),
            ref_columns = !!sym(ref_col)
          )
      }
    }
  }

  base_dm
}

# https://airtable.com/shrJ4mMhUfh2hD5Ew
rairtable_dm <- dm_airtable(
  url = "https://airtable.com/appWAFKLoOMO4HzOD/tblzdIsyVI8TrAgca/viwcKRa10BYSBRee4?blocks=hide"
  )

rairtable_dm |> 
  dm::dm_draw(view_type = "all")

^{Created on 2023-06-09 with reprex v2.0.2}

elipousson commented 1 year ago

A few more ideas based on my continued work with table configurations and models for the pull request:

airtable_field_schema should be renamed airtable_field_model and should include field names, ids, types, and options (for consistency with the Airtable Web API documentation)
airtable_table_schema should be renamed airtable_table_model (again for consistency)
The airtable function should be able to take a airtable_table_schema/airtable_table_model object and create an airtable object. This would allow airtable_base to simply return an airtable_base_schema object rather than the current arrangement where it returns a list with a base ID string, an airtable_base_schema, and a list of airtable objects.

matthewjrogers / rairtable