muschellij2 / rscopus

Scopus Database API Interface to R
74 stars 16 forks source link

bibtex scopus #43

Closed joni73 closed 2 years ago

joni73 commented 2 years ago

if (!is.null(api_key)){ x = article_retrieval(fetch_file[[1]][2], identifier = "scopus_id", verbose = FALSE, headers = inr)

save(x) load(x)

 bib=bibtex_core_data(x) 
 if (!is.null(bib)){
  fileW=sprintf("%s.bib",a)

files=list.files(path="LUT",pattern="*.bib",full.names=TRUE) M <- convert2df(files, dbsource = "scopus", format = "bibtex")

Converting your scopus collection into a bibliographic dataframe

Warning: In your file, some mandatory metadata are missing. Bibliometrix functions may not work properly!

Please, take a look at the vignettes:

Missing fields: AU DE ID C1 CR Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 56

joni73 commented 2 years ago

options: META, META_ABS, META_ABS_REF, FULL, ENTITLED are valid know and example says get "content"??? I have tried FULL and whitout stating,...

joni73 commented 2 years ago

hi, I have tried this also whit abstract function. view=COMPLETE dose not exists, FULL I have tried, so I guess scopus returns something new in content and parsing of function bibtex dose not work. What I'am actually doing I'am planing to foward info to bibliometrix to get bobliometrix easyly and quantenda or kc to to do textanalysis. Now I have to save abstract or article as bibtext and then try to import to bibliometrix by M <- convert2df(files, dbsource = "scopus", format = "bibtex"),... function that dose it straight could be good idea. Whit abstract bibtex_core_data(x) dose not complain but convert2df complain's missing data:

bibtex_core_data(x) fileW=sprintf("%s.bib",a) write(bib, file=fileW ) fileW [1] "1.bib" M <- convert2df(fileW, dbsource = "scopus", format = "bibtex")

Converting your scopus collection into a bibliographic dataframe

Warning: In your file, some mandatory metadata are missing. Bibliometrix functions may not work properly!

Please, take a look at the vignettes:

Missing fields: AU DE ID C1 CR Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1

muschellij2 commented 2 years ago

I have to be honest, I don't really know what you're doing. Can you submit a minimal example, please follow the guide at https://stackoverflow.com/help/minimal-reproducible-example.

joni73 commented 2 years ago

Hi,

Masters Theseus "Future Trends At Academic research - an view from manufacturing, production engineering"

So importing, i guess i use quantenda package For BASIC texttyal analyze.

To calculate metrix related Trends useing bibliometrix package. I use IT to evaluate materials before text analysis.

Problem: most likely bibtexcore funktion dose not write all needed information needed by bibliometrix funktion that reads .Bib filee.

So far I have search and abstract/article retriaval fine, i have used save funktion to save(x) full search results so use load To get those Back - not nice way.

Idea scopus, rscopus ->> bibliometrix, quantenda.

Rscopus: search --> article_retrieve --> bibtexcore ->> Bibliometrix :convertdh2

There should Be way to generate data structures directly to data frames but that needs programming,... starting point coul Be. Hacking bibtexcore For More Fields and convertdc to use bibtex object instead reading file.

yours, Joni

PS. From phone, i can send code from computer. I open to all ideas, this was how i Figuered out.

  1. lokakuuta 2021 21.47.18 GMT+03:00 John Muschelli @.***> kirjoitti:

    I have to be honest, I don't really know what you're doing. Can you submit a minimal example, please follow the guide at https://stackoverflow.com/help/minimal-reproducible-example.

    -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/muschellij2/rscopus/issues/43#issuecomment-946062750

muschellij2 commented 2 years ago

Can you submit a minimal example, please follow the guide at https://stackoverflow.com/help/minimal-reproducible-example.

joni73 commented 2 years ago

ok, not too familiar whit R parameter passing but got workin whit small modifications to bibliometrix,...

retrievalByAuthorID2 <- function (id, apikey, inr, remove.duplicated = TRUE, country = TRUE) 
{
    id = id[!is.na(id)]
    M_list = list()
    n = length(id)
    nomi = c("au_id", "name", "affil_id", "affilname", "n_auth", 
        "n_affils", "citations", "journal", "description", "title", 
        "pii", "doi", "eid", "cover_date", "cover_display_date", 
        "prism_url", "dc_identifier", "dc_creator", "prism_issn", 
        "prism_eIssn", "prism_pageRange", "dc_description", "prism_aggregationType", 
        "subtype", "authkeywords", "source_id")
    M = data.frame(matrix(NA, 1, length(nomi)))
    names(M) = nomi
    for (j in 1:n) {
        AU_ID = id[j]
        cat("\n Query n. ", j, "   Author ID: ", AU_ID)
        AU_S <- tryCatch(author_df_orig(au_id = AU_ID, api_key = apikey, headers = inr,
            all_author_info = TRUE, verbose = FALSE), error = function(e) err = 1)
        if (class(AU_S) != "numeric") {
            AU_S$cover_date = substr(as.character(AU_S$cover_date), 
                1, 4)
            for (i in 1:dim(AU_S)[2]) {
                if (is.factor(AU_S[[i]])) {
                  AU_S[[i]] = as.character(AU_S[[i]])
                }
            }
            M_AU = data.frame(AU_S, stringsAsFactors = FALSE)
            if (dim(M_AU)[2] <= dim(M)[2]) {
                M_AU[setdiff(names(M), names(M_AU))] = NA
            }
            M = rbind(M, M_AU[names(M)])
            M_list[[j]] = M_AU
            names(M_list)[j] = id[j]
        }
        else {
            cat("\n Error in id:", AU_ID, "retrieval\n")
        }
    }
    M = M[-1, ]
    names(M) = c("AU_ID", "AU", "C1_ID", "C1", "nAU", "nC1", 
        "TC", "SO", "DT", "TI", "PII", "DI", "EID", "PY", "CDD", 
        "URL", "UT", "AU1", "ISSN", "EISSN", "PAG", "AB", "PT", 
        "SUBTYPE", "DE", "SO_ID")
    if (isTRUE(remove.duplicated)) {
        d = duplicated(gsub("[^[:alnum:] ]", "", M$UT))
        cat("\n", sum(d), "duplicated documents have been removed\n")
        M = M[!d, ]
    }
    M$CR = NA
    M$DB = "SCOPUS"
    M$DE = gsub("\\| ", ";", M$DE)
    M$ID = M$DE
    if (isTRUE(country)) {
        M$AU_CO = paste(M$C1_ID, ";", sep = "")
        cat("\nAuthors' country retrieval\n\n")
        aff_id = sort(unique(unlist(strsplit(M$C1_ID, ";"))))
        aff_id = aff_id[nchar(aff_id) > 1]
        AFF = data.frame(ID = NA, NAME = NA, CO = NA)
        for (i in 1:length(aff_id)) {
            a = affiliation_retrieval(aff_id[i], api_key = apikey, headers = inr,
                verbose = FALSE)
            AFF[i, 1] = aff_id[i]
            if (length(a$content$`affiliation-retrieval-response`$`affiliation-name`) > 
                0) {
                AFF[i, 2] = a$content$`affiliation-retrieval-response`$`affiliation-name`
            }
            if (length(a$content$`affiliation-retrieval-response`$country) > 
                0) {
                AFF[i, 3] = a$content$`affiliation-retrieval-response`$country
            }
            cat("\nAffiliation ID: ", AFF[i, 1], "   Name: ", 
                AFF[i, 2], ",", AFF[i, 3])
            M$AU_CO = gsub(paste(aff_id[i], ";", sep = ""), paste(AFF[i, 
                3], ";", sep = ""), M$AU_CO)
        }
        M$AU_CO = gsub(";;", ";", M$AU_CO)
        M$AU_CO[nchar(M$AU_CO) < 3] = NA
        M$AU1_CO = unlist(lapply(strsplit(M$AU_CO, ";"), function(l) {
            l = l[1]
        }))
        UN = strsplit(M$C1, ";")
        CO = strsplit(M$AU_CO, ";")
        for (i in 1:length(UN)) {
            M$C1[i] = paste(paste(UN[[i]], ", ", CO[[i]], sep = ""), 
                collapse = ";")
        }
    }
    M <- data.frame(lapply(M, toupper), stringsAsFactors = FALSE)
    M$TC = as.numeric(M$TC)
    M$PY = as.numeric(M$PY)
    M$DB = "SCOPUS"
    M$RP = unlist(lapply(strsplit(M$C1, ";"), function(l) {
        l = l[1]
    }))
    M$CR <- NA
    M$J9 <- M$JI <- M$SO
    suppressWarnings(M <- metaTagExtraction(M, Field = "SR"))
    SR = M$SR
    tab = table(SR)
    tab2 = table(tab)
    ind = as.numeric(names(tab2))
    ind = ind[which(ind > 1)]
    if (length(ind) > 0) {
        for (i in ind) {
            indice = names(which(tab == i))
            for (j in indice) {
                indice2 = which(SR == j)
                SR[indice2] = paste(SR[indice2], as.character(1:length(indice2)), 
                  sep = " ")
            }
        }
    }
    row.names(M) <- SR
    results <- list(M = M, authorDocuments = M_list)
    return(results)
}

idByAuthor2 <- function (df,apikey, inr) 
{
    n = dim(df)[1]
    AU_ID = NA
    AU_AFF = NA
    AU_count = NA
    for (j in 1:n) {
        lastname = tolower(df[j, 1])
        firstname = tolower(df[j, 2])
        if (!is.na(df[j, 3])) {
            query1 = paste("affil(", df[j, 3], ")", sep = "")
        }
        else {
            query1 = NULL
        }
        cat("\nSearching author's info: ", toupper(df[j, 1]), 
            toupper(df[j, 2]))

            print( firstname )

        AU_info = get_complete_author_info(last_name = lastname, 
            first_name = firstname, api_key = apikey, query = query1, headers=inr, verbose=TRUE)

            print(AU_info)

        if (AU_info$content$`search-results`$`opensearch:totalResults` != 
            0) {
            AU_ID[j] = AU_info[[2]]$`search-results`$entr[[1]]$`dc:identifier`
            AU_ID[j] = gsub("AUTHOR_ID:", "", AU_ID[j])
            AU_info2 = AU_info[[2]]
            aff = AU_info2$`search-results`$entry[[1]]$`affiliation-current`
            AU_AFF[j] = paste(aff$`affiliation-name`, ", ", aff$`affiliation-city`, 
                ", ", aff$`affiliation-country`, sep = "")
            AU_count[j] = AU_info[[2]]$`search-results`$entr[[1]]$`document-count`
        }
        else {
            AU_ID[j] = NA
            AU_AFF[j] = NA
            AU_count[j] = NA
        }
    }
        print("DONE")
    authorsID = data.frame(lastname = df[, 1], firstname = df[, 
        2], id = AU_ID, affiliation = AU_AFF, count = AU_count, 
        stringsAsFactors = FALSE)

        print(authorsID)

    return(authorsID)
}