massimoaria / bibliometrix

An R-tool for comprehensive science mapping analysis. A package for quantitative research in scientometrics and bibliometrics.
https://www.bibliometrix.org
Other
496 stars 147 forks source link

AB_TM not appearing in data frame #3

Closed swood-ecology closed 7 years ago

swood-ecology commented 7 years ago

Hi,

I downloaded three .bib files from ISI, converted them to a data frame, and stitched them together with the following command:

data <- rbind(isibib2df(readFiles("results_1to500.bib")),
              isibib2df(readFiles("results_501to1000.bib")),
              isibib2df(readFiles("results_1001to1461.bib")))

When I tried to run a co-occurence analysis on abstracts I was given the following error:

biblioNetwork(data, analysis = "co-occurrences", network = "abstracts")
[1] "Field AB_TM is not a column name of input data frame"
Error in crossprod(x, y) : 
  requires numeric/complex matrix/vector arguments

data$AB is present in the data frame, but not data$AB_TM. Is there any way to chose the specific variable to use to override "abstracts"?

massimoaria commented 7 years ago

First, you have to create AB-TM field through the function metaTagExtraction. Later, you can apply biblioNetwork.

Sent from Alto On Thursday, August 3, 2017 at 22:37 Stephen Wood notifications@github.com wrote:

Hi,

I downloaded three .bib files from ISI, converted them to a data frame, and stitched them together with the following command:

data <- rbind(isibib2df(readFiles("results_1to500.bib")), isibib2df(readFiles("results_501to1000.bib")), isibib2df(readFiles("results_1001to1461.bib")))

When I tried to run a co-occurence analysis on abstracts I was given the following error:

biblioNetwork(data, analysis = "co-occurrences", network = "abstracts") [1] "Field AB_TM is not a column name of input data frame" Error in crossprod(x, y) : requires numeric/complex matrix/vector arguments

data$AB is present in the data frame, but not data$AB_TM. Is there any way to chose the specific variable to use to override "abstracts"?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

massimoaria commented 7 years ago

Sorry, The function to create AB-TM is termExtraction

Sent from Alto On Thursday, August 3, 2017 at 22:42 Massimo Aria aria@unina.it wrote:

First, you have to create AB-TM field through the function metaTagExtraction. Later, you can apply biblioNetwork.

Sent from Alto

On Thursday, August 3, 2017 at 22:37 Stephen Wood notifications@github.com wrote:

Hi,

I downloaded three .bib files from ISI, converted them to a data frame, and stitched them together with the following command:

data <- rbind(isibib2df(readFiles("results_1to500.bib")), isibib2df(readFiles("results_501to1000.bib")), isibib2df(readFiles("results_1001to1461.bib")))

When I tried to run a co-occurence analysis on abstracts I was given the following error:

biblioNetwork(data, analysis = "co-occurrences", network = "abstracts") [1] "Field AB_TM is not a column name of input data frame" Error in crossprod(x, y) : requires numeric/complex matrix/vector arguments

data$AB is present in the data frame, but not data$AB_TM. Is there any way to chose the specific variable to use to override "abstracts"?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

swood-ecology commented 7 years ago

I had been going off of the intro tutorial, which didn't specify this. Thanks for the clarification.

massimoaria commented 7 years ago

Dear Stephen,

this from html bibliometrix tutorial (http://htmlpreview.github.io/?https://github.com/massimoaria/bibliometrix/master/vignettes/bibliometrix-vignette.html):

Co-Word Analysis: Conceptual structure of a field The aim of the co-word analysis is to map the conceptual structure of a framework using the word co-occurrences in a bibliographic collection. The analysis can be performed through dimensionality reduction techniques such as Multidimensional Scaling (MDS) or Multiple Correspondence Analysis (MCA). Here, we show an example using the function conceptualStructure that performs a MCA to draw a conceptual structure of the field and K-means clustering to identify clusters of documents which express common concepts. Results are plotted on a two-dimensional map. conceptualStructure includes natural language processing (NLP) routines (see the function termExtraction) to extract terms from titles and abstracts. In addition, it implements the Porter’s stemming algorithm to reduce inflected (or sometimes derived) words to their word stem, base or root form."


Dr. Massimo Aria Associate Professor in Social Statistics PhD in Computational Statistics Laboratory and Research Group STAD Statistics, Technology, Data Analysis Department of Economics and Statistics University of Naples Federico II Monte S. Angelo, via Cinthia I-80126 Napoli, Italy Room D-25, Sector D, 2nd Floor, Building 3 ph.  +39 081675187 fax  +39 081675009 mob. +39 392 1966384 email aria@unina.it [mailto:aria@unina.it] http://www.massimoaria.com [http://www.massimoaria.com] http://www.stad.unina.it [http://www.stad.unina.it]


  Please check my newest contributions: Statistical software

swood-ecology commented 7 years ago

Thanks. I've been having a little trouble with the name formatting of the matrix that biblioNetwork produces. What I'd like to do is extract all of the entries from a co-citation reference analysis that are present in the original data set. The problem that I'm running up against is that the formatting of the rownames in the biblioNetwork aren't the same as the data from the original data set.

If I looked at the rownames(biblioNetwork(data, analysis = "co-citation", network = "references", sep=". "))

I'd get an entry like: "DORAN JW 2002 AGR ECOSYST ENVIRON "

I tried to re-create that format from the original data set with: paste(gsub(","," ",biblioAnalysis(data, sep = ";")$FirstAuthors),data$PY,data$JI)

But it gives me an entry like: "DORAN JW 2002 AGRIC. ECOSYST. ENVIRON."

Dropping the periods would be easy, but you notice that the journal shorthand is different (AGR vs. AGRIC). Do you have any suggestions for focusing this co-citation analysis for only the papers included in the original data set?