ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
98 stars 21 forks source link

Prune edges in oa_snowball #133

Closed yjunechoe closed 1 year ago

yjunechoe commented 1 year ago

This removes edges to/from missing nodes (i.e., ensures that all edges are connected to a node) to satisfy constraints of as_tbl_graph()

Resolves #126 :

devtools::load_all()
library(tidygraph)

ids <- c("W1896013598", "W312683970", "W2084630927")

ilk_snowball <- oa_snowball(
  identifier = ids,
  verbose = FALSE
)

as_tbl_graph(ilk_snowball)
#> # A tbl_graph: 301 nodes and 298 edges
#> #
#> # A rooted forest with 3 trees
#> #
#> # A tibble: 301 × 31
#>   id          display_name author ab    publication_date so    so_id host_organization issn_l url   pdf_url license version first_page last_page volume issue is_oa
#>   <chr>       <chr>        <list> <chr> <chr>            <chr> <chr> <chr>             <chr>  <chr> <chr>   <chr>   <chr>   <chr>      <chr>     <chr>  <chr> <lgl>
#> 1 W1896013598 Reforesting… <df>   ""    2009-09-01       AMBI… http… Springer Science… 0044-… http… <NA>    <NA>    <NA>    325        333       38     6     FALSE
#> 2 W312683970  Does outmig… <df>   "In … 2015-08-01       Appl… http… Elsevier BV       0143-… http… <NA>    <NA>    <NA>    157        170       62     <NA>  FALSE
#> 3 W2084630927 Lake victor… <df>   "The… 2004-05-01       Limn… http… Elsevier BV       0075-… http… <NA>    <NA>    <NA>    105        109       34     1-2   FALSE
#> 4 W2104256261 Classical b… <df>   "Of … 2010-08-01       Biol… http… Elsevier BV       1049-… http… <NA>    <NA>    <NA>    S2         S33       54     <NA>  FALSE
#> 5 W1980017473 Payments fo… <df>   "Rec… 2012-05-01       Geof… http… Elsevier BV       0016-… http… <NA>    <NA>    <NA>    412        426       43     3     FALSE
#> 6 W2883837127 The role of… <df>   "Inv… 2019-01-01       Jour… http… Elsevier BV       0301-… http… <NA>    <NA>    <NA>    145        157       229    <NA>  FALSE
#> # ℹ 295 more rows
#> # ℹ 13 more variables: cited_by_count <int>, counts_by_year <list>, publication_year <int>, cited_by_api_url <chr>, ids <list>, doi <chr>, type <chr>,
#> #   referenced_works <list>, related_works <list>, is_paratext <lgl>, is_retracted <lgl>, concepts <list>, oa_input <lgl>
#> #
#> # A tibble: 298 × 2
#>    from    to
#>   <int> <int>
#> 1     4     3
#> 2     5     1
#> 3     6     3
#> # ℹ 295 more rows