jonthegeek / wapir

Web APIs with R
https://wapir.io/
22 stars 3 forks source link

Task: Find all APIs on apis.guru that are categorized as "open_data" #65

Open jonthegeek opened 4 months ago

jonthegeek commented 4 months ago

While not strictly NECESSARY, it's easiest to do this with rectangled data. That's the LOs I'm expecting here.

This tibblifys poorly. Don't go into tibblify yet here, and save this for a later discussion of pros and cons of tibblify.

jonthegeek commented 4 months ago
# Named list --> nested tibble
all_apis_df <- all_apis |>
  tibble::enframe(name = "api_name")
all_apis_df
all_apis_df$value |> lengths() |> unique()
all_apis_df$value[[1]] |> names()
setdiff(
  names(all_apis_df$value[[1]]),
  names(all_apis_df$value[[11]])
)

# all_apis_df$value contains length-3 named lists. Each value looks like a
# column.
all_apis_versions <- all_apis_df |>
  tidyr::unnest_wider(value)
all_apis_versions
all_apis_versions$versions |> lengths() |> unique()
all_apis_versions$versions |> lengths() |> head(10)
all_apis_versions$versions[[10]] |> names()
setdiff(
  names(all_apis_versions$versions[[10]]),
  names(all_apis_df$value[[1]])
)
# Each `versions` value is a separate API version, with no standardization.
# Prime case for unnesting longer.
all_apis_preferred <- all_apis_versions |>
  tidyr::unnest_longer(versions, indices_to = "version") |>
  # We only care about the "preferred" versions.
  dplyr::filter(preferred == version) |>
  # "preferred" and "version" now contain the same info by definition. In this
  # case "added" is duplicated in versions, so lets get rid of it, too. We also
  # want to reorder, so we'll select the columns we care about.
  dplyr::select(api_name, version, versions)
all_apis_preferred
all_apis_preferred$versions |> lengths() |> unique()
setdiff(
  names(all_apis_preferred$versions[[7]]),
  names(all_apis_preferred$versions[[1]])
)
# It looks like there's an optional field, but otherwise these are
# rectangle-able.
all_apis_preferred_wide <- all_apis_preferred |>
  tidyr::unnest_wider(versions)
all_apis_preferred_wide
all_apis_preferred_wide$info |> lengths() |> unique()
all_apis_preferred_wide$info |> lengths() |> head()
setdiff(
  names(all_apis_preferred_wide$info[[4]]),
  names(all_apis_preferred_wide$info[[1]])
)
# all_apis_preferred_wide$info is a list of many possible columns. We don't want
# all of them, we just want the categories.
all_apis_preferred_wide |>
  tidyr::hoist(info, categories = "x-apisguru-categories") |>
  tidyr::unnest_longer(categories) |>
  dplyr::filter(categories == "open_data")