ropensci / openalexR

Getting bibliographic records from OpenAlex
https://docs.ropensci.org/openalexR/
Other
102 stars 21 forks source link

Unhelpful rownames from oa2bibliometrix() #14

Closed trangdata closed 2 years ago

trangdata commented 2 years ago

I'm not sure if the rownames are needed for downstream analyses of bibliometrix, but perhaps we should remove (also, the SR column already has this information). What do you think @massimoaria?

library(openalexR)
dat <- oa2bibliometrix(oa_fetch(
  entity = "works",
  cites = "W2755950973",
  from_publication_date = "2022-01-01",
  to_publication_date = "2022-01-31"
))

head(rownames(dat))
#> [1] "NA, , V99313352"   "NA, , V4210226067" "NA, , V201530359" 
#> [4] "NA, , V59624048"   "NA, , V121203305"  "NA, , V68497187"

Created on 2022-09-10 with reprex v2.0.2

massimoaria commented 2 years ago

the file as is will not work in bibliometrix. Bibliometrix needs the rownames and the SR field to work properly. This information is used as short labels of papers in many analyses.

trangdata commented 2 years ago

Got it. We'll leave this for now.

massimoaria commented 2 years ago

I am working on this issue. Give me a few minutes...

massimoaria commented 2 years ago

Last commit should have solved the issue.

library(openalexR) library(bibliometrix) library(dplyr) dat <- oa2bibliometrix(oa_fetch( entity = "works", cites = "W2755950973", from_publication_date = "2022-01-01", to_publication_date = "2022-01-31" ))

dat %>% biblioAnalysis() %>% summary()

`MAIN INFORMATION ABOUT DATA

Timespan 2022 : 2022 Sources (Journals, Books, etc) 98 Documents 112 Annual Growth Rate % 0 Document Average Age 0 Average citations per doc 0.9107 Average citations per year per doc 0.9107 References 7367

DOCUMENT TYPES
book-chapter 19 journal-article 93

DOCUMENT CONTENTS Keywords Plus (ID) 382 Author's Keywords (DE) 0

AUTHORS Authors 413 Author Appearances 426 Authors of single-authored docs 11

AUTHORS COLLABORATION Single-authored docs 11 Documents per Author 0.271 Co-Authors per Doc 3.8 International co-authorships % 40.18

Annual Scientific Production

Year Articles 2022 112

Annual Percentage Growth Rate 0

Most Productive Authors

                   Authors        Articles              Authors        Articles Fractionalized

1 ALHAMZAH ALNOOR 2 VIJAY PEREIRA 1.5 2 CARLOS FRANCISCO SIMÕES GOMES 2 ANTON PEEZ 1.0 3 IGOR PINHEIRO DE ARAÚJO COSTA 2 ARAS BOZKURT 1.0 4 KANWALJEET GARG 2 ATIK KULAKLI 1.0 5 MARCOS DOS SANTOS 2 DIEUDONNÉ TCHUENTE 1.0 6 MATTI TEDRE 2 EVANDRO EDUARDO BRODAY 1.0 7 MAYARA LUSTOSA DE OLIVEIRA BARBOSA 2 JULIÁN DAVID CORTÉS SÁNCHEZ 1.0 8 MOHAMMED SAQR 2 MARCO SCHIRONE 1.0 9 STEFANO ZA 2 METEHAN FERIDUN SORKUN 1.0 10 VIJAY PEREIRA 2 RAYEES FAROOQ 1.0

Top manuscripts per citations

                                                        Paper          DOI TC TCperYear  NTC

1 PAOLO BIANCONE, 2022, BRITISH FOOD JOURNAL 6 6 6.59 2 SANDIP MUKHOPADHYAY, 2022, JOURNAL OF HOSPITALITY AND TOURISM INSIGHTS 6 6 6.59 3 GUSTAVO FÜHR HARTMANN, 2022, ENVIRONMENTAL POLLUTION 5 5 5.49 4 KANWALJEET GARG, 2022, WORLD NEUROSURGERY 5 5 5.49 5 CHRISTOPHER MARTIN MIKKELSEN COX, 2022, INFANCY 4 4 4.39 6 DOMINIK FRANJO DOMINKOVIĆ, 2022, RENEWABLE & SUSTAINABLE ENERGY REVIEWS 4 4 4.39 7 GIANVITO SCARINGI, 2022, GEOMORPHOLOGY 4 4 4.39 8 JING WANG, 2022, TECHNOLOGY ANALYSIS & STRATEGIC MANAGEMENT 4 4 4.39 9 YONG QIN, 2022, RENEWABLE & SUSTAINABLE ENERGY REVIEWS 3 3 3.29 10 VIMMY BAJAJ, 2022, RESEARCH IN INTERNATIONAL BUSINESS AND FINANCE 3 3 3.29

Corresponding Author's Countries

    Country Articles   Freq SCP MCP MCP_Ratio

1 CHINA 10 0.1282 6 4 0.400 2 INDIA 10 0.1282 7 3 0.300 3 BRAZIL 6 0.0769 5 1 0.167 4 ITALY 5 0.0641 2 3 0.600 5 SOUTH AFRICA 5 0.0641 2 3 0.600 6 MALAYSIA 4 0.0513 0 4 1.000 7 AUSTRALIA 3 0.0385 1 2 0.667 8 COLOMBIA 3 0.0385 3 0 0.000 9 USA 3 0.0385 2 1 0.333 10 CANADA 2 0.0256 1 1 0.500

SCP: Single Country Publications

MCP: Multiple Country Publications

Total Citations per Country

 Country      Total Citations Average Article Citations

1 INDIA 18 1.800 2 CHINA 11 1.100 3 BRAZIL 9 1.500 4 MALAYSIA 8 2.000 5 ITALY 6 1.200 6 CZECH REPUBLIC 4 4.000 7 DENMARK 4 4.000 8 USA 4 1.333 9 AUSTRALIA 3 1.000 10 ISRAEL 2 2.000

Most Relevant Sources

                                      Sources        Articles

1 IEEE ACCESS 5 2 ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH 3 3 SUSTAINABILITY 3 4 FRONTIERS IN PUBLIC HEALTH 2 5 LANDSCAPE ECOLOGY 2 6 LECTURE NOTES IN INFORMATION SYSTEMS AND ORGANISATION 2 7 LECTURE NOTES IN NETWORKS AND SYSTEMS 2 8 PROCEDIA COMPUTER SCIENCE 2 9 RENEWABLE & SUSTAINABLE ENERGY REVIEWS 2 10 ADMINISTRATIVE SCIENCES 1`

trangdata commented 2 years ago

hmm I downloaded the latest version of bibliometrix on cran and pulled the latest commit of openalexR and still got an error:

library(openalexR)
library(bibliometrix)
#> To cite bibliometrix in publications, please use:
#> 
#> Aria, M. & Cuccurullo, C. (2017) bibliometrix: An R-tool for comprehensive science mapping analysis, 
#>                                  Journal of Informetrics, 11(4), pp 959-975, Elsevier.
#>                         
#> 
#> https://www.bibliometrix.org
#> 
#>                         
#> For information and bug reports:
#>                         - Send an email to info@bibliometrix.org   
#>                         - Write a post on https://github.com/massimoaria/bibliometrix/issues
#>                         
#> Help us to keep Bibliometrix free to download and use by contributing with a small donation to support our research team (https://bibliometrix.org/donate.html)
#> 
#>                         
#> To start with the shiny web-interface, please digit:
#> biblioshiny()
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
packageVersion("bibliometrix")
#> [1] '4.0.0'

dat <- oa2bibliometrix(oa_fetch(
  entity = "works",
  cites = "W2755950973",
  from_publication_date = "2022-01-01",
  to_publication_date = "2022-01-31"
))

dat %>%
  biblioAnalysis() 
#> Error in aggregate.data.frame(as.data.frame(x), ...): no rows to aggregate

Created on 2022-09-11 with reprex v2.0.2

massimoaria commented 2 years ago

Please use the GitHub version of bibliometrix:

library(openalexR) remotes::install_github("massimoaria/bibliometrix") library(bibliometrix)

trangdata commented 2 years ago

hmm I downloaded the bibliometrix package (v4.0.1) from GitHub but still get the same error...

massimoaria commented 2 years ago

I don't know why. I also tried using the bibliometrix last updated version from CRAN (4.0.1). The example seems to work fine. Here the code and output through repex:

library(openalexR)
library(bibliometrix)
#> To cite bibliometrix in publications, please use:
#> 
#> Aria, M. & Cuccurullo, C. (2017) bibliometrix: An R-tool for comprehensive science mapping analysis, 
#>                                  Journal of Informetrics, 11(4), pp 959-975, Elsevier.
#>                         
#> 
#> https://www.bibliometrix.org
#> 
#>                         
#> For information and bug reports:
#>                         - Send an email to info@bibliometrix.org   
#>                         - Write a post on https://github.com/massimoaria/bibliometrix/issues
#>                         
#> Help us to keep Bibliometrix free to download and use by contributing with a small donation to support our research team (https://bibliometrix.org/donate.html)
#> 
#>                         
#> To start with the shiny web-interface, please digit:
#> biblioshiny()
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
dat <- oa_fetch(
  entity = "works",
  cites = "W2755950973",
  from_publication_date = "2022-01-01",
  to_publication_date = "2022-01-31"
) %>% oa2bibliometrix()
res <- biblioAnalysis(dat)
summary(res)
#> 
#> 
#> MAIN INFORMATION ABOUT DATA
#> 
#>  Timespan                              2022 : 2022 
#>  Sources (Journals, Books, etc)        104 
#>  Documents                             120 
#>  Annual Growth Rate %                  0 
#>  Document Average Age                  0 
#>  Average citations per doc             1.042 
#>  Average citations per year per doc    1.042 
#>  References                            7530 
#>  
#> DOCUMENT TYPES                     
#>  book-chapter             24 
#>  journal-article          94 
#>  proceedings-article      2 
#>  
#> DOCUMENT CONTENTS
#>  Keywords Plus (ID)                    418 
#>  Author's Keywords (DE)                0 
#>  
#> AUTHORS
#>  Authors                               437 
#>  Author Appearances                    455 
#>  Authors of single-authored docs       12 
#>  
#> AUTHORS COLLABORATION
#>  Single-authored docs                  12 
#>  Documents per Author                  0.275 
#>  Co-Authors per Doc                    3.79 
#>  International co-authorships %        40.83 
#>  
#> 
#> Annual Scientific Production
#> 
#>  Year    Articles
#>     2022      120
#> 
#> Annual Percentage Growth Rate 0 
#> 
#> 
#> Most Productive Authors
#> 
#>                        Authors        Articles         Authors        Articles Fractionalized
#> 1  ALHAMZAH ALNOOR                           2 VIJAY PEREIRA                              1.5
#> 2  B. B. ZAIDAN                              2 ANTON PEEZ                                 1.0
#> 3  BOJAN LALIC                               2 ARAS BOZKURT                               1.0
#> 4  CARLOS FRANCISCO SIMÕES GOMES             2 ATIK KULAKLI                               1.0
#> 5  DANIJELA CIRIC                            2 DIEUDONNÉ TCHUENTE                         1.0
#> 6  IGOR PINHEIRO DE ARAÚJO COSTA             2 EVANDRO EDUARDO BRODAY                     1.0
#> 7  KANWALJEET GARG                           2 JULIAN D. CORTES                           1.0
#> 8  MARCOS DOS SANTOS                         2 MARCO SCHIRONE                             1.0
#> 9  MATTI TEDRE                               2 METEHAN FERIDUN SORKUN                     1.0
#> 10 MAYARA LUSTOSA DE OLIVEIRA BARBOSA        2 NEERAJ CHUGH                               1.0
#> 
#> 
#> Top manuscripts per citations
#> 
#>                                                             Paper          DOI TC TCperYear  NTC
#> 1  YONG QIN, 2022, RENEWABLE & SUSTAINABLE ENERGY REVIEWS                       7         7 6.72
#> 2  DIEGO A. ESQUIVEL-HERNÁNDEZ, 2022, JOURNAL OF ENVIRONMENTAL MANAGEMENT       6         6 5.76
#> 3  PAOLO BIANCONE, 2022, BRITISH FOOD JOURNAL                                   6         6 5.76
#> 4  SANDIP MUKHOPADHYAY, 2022, JOURNAL OF HOSPITALITY AND TOURISM INSIGHTS       6         6 5.76
#> 5  GUSTAVO FÜHR HARTMANN, 2022, ENVIRONMENTAL POLLUTION                         5         5 4.80
#> 6  KANWALJEET GARG, 2022, WORLD NEUROSURGERY                                    5         5 4.80
#> 7  GIANVITO SCARINGI, 2022, GEOMORPHOLOGY                                       5         5 4.80
#> 8  CHRISTOPHER MARTIN MIKKELSEN COX, 2022, INFANCY                              4         4 3.84
#> 9  DOMINIK FRANJO DOMINKOVIĆ, 2022, RENEWABLE & SUSTAINABLE ENERGY REVIEWS      4         4 3.84
#> 10 JING WANG, 2022, TECHNOLOGY ANALYSIS & STRATEGIC MANAGEMENT                  4         4 3.84
#> 
#> 
#> Corresponding Author's Countries
#> 
#>         Country Articles   Freq SCP MCP MCP_Ratio
#> 1  CHINA              10 0.1205   6   4     0.400
#> 2  INDIA              10 0.1205   7   3     0.300
#> 3  BRAZIL              6 0.0723   5   1     0.167
#> 4  ITALY               6 0.0723   2   4     0.667
#> 5  MALAYSIA            5 0.0602   0   5     1.000
#> 6  SOUTH AFRICA        5 0.0602   2   3     0.600
#> 7  AUSTRALIA           3 0.0361   1   2     0.667
#> 8  COLOMBIA            3 0.0361   3   0     0.000
#> 9  USA                 3 0.0361   2   1     0.333
#> 10 CANADA              2 0.0241   1   1     0.500
#> 
#> 
#> SCP: Single Country Publications
#> 
#> MCP: Multiple Country Publications
#> 
#> 
#> Total Citations per Country
#> 
#>      Country      Total Citations Average Article Citations
#> 1  INDIA                       20                     2.000
#> 2  CHINA                       15                     1.500
#> 3  BRAZIL                      10                     1.667
#> 4  MALAYSIA                     8                     1.600
#> 5  ITALY                        6                     1.000
#> 6  MEXICO                       6                     6.000
#> 7  CZECH REPUBLIC               5                     5.000
#> 8  AUSTRALIA                    4                     1.333
#> 9  DENMARK                      4                     4.000
#> 10 USA                          4                     1.333
#> 
#> 
#> Most Relevant Sources
#> 
#>                                               Sources        Articles
#> 1  IEEE ACCESS                                                      5
#> 2  ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH                     3
#> 3  SUSTAINABILITY                                                   3
#> 4  FRONTIERS IN PUBLIC HEALTH                                       2
#> 5  IFIP ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGY        2
#> 6  LANDSCAPE ECOLOGY                                                2
#> 7  LECTURE NOTES IN INFORMATION SYSTEMS AND ORGANISATION            2
#> 8  LECTURE NOTES IN NETWORKS AND SYSTEMS                            2
#> 9  NUCLEATION AND ATMOSPHERIC AEROSOLS                              2
#> 10 PROCEDIA COMPUTER SCIENCE                                        2

Created on 2022-09-21 with reprex v2.0.2

trangdata commented 2 years ago

@massimoaria that is completely my fault!!! I pulled the changes on openalexR but hadn't rebuild the package locally! I got the same result as you do now. Please feel free to close the issue.