massimoaria / bibliometrix

An R-tool for comprehensive science mapping analysis. A package for quantitative research in scientometrics and bibliometrics.
https://www.bibliometrix.org
Other
511 stars 149 forks source link

Filtering cocMatrix to top-cited in citations #105

Closed storopoli closed 4 years ago

storopoli commented 4 years ago

Dear Massimo,

I am trying to subset the cocitation matrix resulted from cocMatrix(M, Field = "CR", sep = ";", binary = F) using the top-N references from citations(M, field = "article", sep = ";")$Cited.

There is some inconsistency in the string manipulation from Cited References between the functions citations and cocMatrix

My goal is to make a cocitation matrix with only the top-57 (number of citations >=4) top-cited references.

I try to subset the matrix returned by cocMatrix with the following argument top_cited$CR %in% colnames(cocit) and I should have gotten a vector full of TRUE but instead I've got all FALSE

I am attaching the .rds file that I've used to generate the bibliometrix dataframe (M) and below is my script.

all_records.rds.zip

library(bibliometrix)
library(dplyr)
library(Matrix)

M <- readRDS("data/all_records.rds")
CR <- as_tibble(citations(M, field = "article", sep = ";")$Cited)
top_cited <- CR %>% # 58 top-cited (min 4 citations)
  filter(n >= 4)

cocit <- cocMatrix(M, Field = "CR", sep = ";", binary = F)
cocit <- crossprod(cocit, cocit)

# this is the problem \/
cocit <- cocit[colnames(cocit) %in% top_cited$CR,
              colnames(cocit) %in% top_cited$CR]
massimoaria commented 4 years ago

Yes, you are right but this difference is not an inconsistency but our choice. In citations() reference items are not manipulated (we have no reason to do that) while we defined a sort of simplified reference item form to fit labels in the network plot.

To identify the most cited references in cocit matrix you don't need to use CR. You simply need to use the sum by column (colSums function) and select the first 58 columns in descending order.

In a document x reference matrix, the sum by column represent the number of citations of each reference.