massimoaria / bibliometrix

An R-tool for comprehensive science mapping analysis. A package for quantitative research in scientometrics and bibliometrics.
https://www.bibliometrix.org
Other
500 stars 148 forks source link

When merging the Wos and Scopus dataframes, which Total Citations (TC) value is selected? #475

Closed nakiamo closed 4 days ago

nakiamo commented 2 months ago

Hello. First of all, thanks a lot for this helpful package.

My question is about when merging WoS and Scopus databases, which TC (total citation) information is selected in the merged data if an article is in both WoS and Scopus data.

If I merge in this order (WoS first):

combined_wos_scopus <- mergeDbSources(wos_data, scopus_data, remove.duplicated = T)

and when I order the articles in a descending order based on their TC, this is the result:

Auhor(s) TC
NORBERG PA, 2007, J CONSUM AFF 764
PHELPS J, 2000, J PUBLIC POLICY MARK 554
ACQUISTI A, 2016, J ECON LIT 409
TUCKER CE, 2014, J MARKETING RES 395
GOLDFARB A, 2011, MARKET SCI 383
AGUIRRE E, 2015, J RETAILING 332
GOLDFARB A, 2011, MANAGE SCI 329
SHEEHAN KB, 1999, J INTERACT MARK 305
SHEEHAN KB, 2000, J PUBLIC POLICY MARK 296
BAEK TH, 2012, J ADVERTISING 275

If I merge in this order (Scopus first):

combined_wos_scopus_alternative <- mergeDbSources(scopus_data, wos_data, remove.duplicated = T)

this is the result:

Auhor(s) TC
NORBERG PA, 2007, J CONSUM AFF 764
PHELPS J, 2000, J PUBLIC POLICY MARK 554
TUCKER CE, 2014, J MARK RES 509
GOLDFARB A, 2011, MARK SCI 469
ACQUISTI A, 2016, J ECON LIT 409
BAEK T, 2012, J ADVERT 353
AGUIRRE E, 2015, J RETAILING 332
GOLDFARB A, 2011, MANAGE SCI 329
SHEEHAN KB, 1999, J INTERACT MARK 305
SHEEHAN KB, 2000, J PUBLIC POLICY MARK 296

So, depending on which database you put first while using mergeDbSources function the results change. It prioritize the first database's TC results while merging.

This is not unexpected or wrong but the analysis results (e.g. analysis result of most cited articles) change based on which TC information is used.

Apart from these two options, another method would be to manually choose the highest CT value, if an article is indexed in both indexes.

I think the researcher can choose one of these tree alternatives (prioritize WoS, prioritize Scopus, prioritize the highest TC value ) if an article is indexed in both WoS and Scopus. and I'm not sure which one of these alternatives is ideal.

I thought it would be nice if we could have the option to choose the merging method (deciding which TC value to choose) in mergeDbSources function.

I'm new to this package and I may have overlooked something that's already there, and I'm sorry if that's the case.

massimoaria commented 2 months ago

Hi, We have just published the new version 4.3.0 on CRAN. One of the main features concerns merging between databases. It is now possible to merge between all supported databases, and the metadata stored in the case of duplicate records follows a hierarchy that no longer depends on the order of the data frames passed to the mergeDbSources function.

The hierarchy is based on data quality and is as follows (from the highest to lowest priority): 1) WoS 2) Scopus 3) OpenAlex 4) Lens 5) Dimensions 6) PubMed 7) Cochrane

Your idea about deciding which TC value to choose is interesting. We will think about it.

nakiamo commented 1 month ago

Thank you for providing such a clear answer.