muschellij2 / rscopus

Scopus Database API Interface to R
76 stars 16 forks source link

Extracting Abstracts #34

Open sgalla32 opened 4 years ago

sgalla32 commented 4 years ago

My hope is to develop a script that extract just the abstract information to a vector of character strings from an rscopus search. To start this code, I have created an if statement that creates the search, and extracts a list of just rscopus ID numbers:

if(have_api_key()) {
  #First, the scopus search, with a high max_count. If I increase the count#, I usually get an error.
  res = scopus_search(query = "conservation AND translocation", 
                      max_count = 20, count = 10)

  #Then, list some data frames from these entries
  df = gen_entries_to_df(res$entries)
  head(df$df)

  #Just extract the scopus identifier
  Scopus_IDs <- df$df$`dc:identifier`
  head(Scopus_IDs)

  #Take SCOPUS_ID: off these numbers, so we just have the ID numbers. 
  Scopus_IDs_Clean <- str_remove(Scopus_IDs, "SCOPUS_ID:")
  head(Scopus_IDs_Clean)
}

Before I write a loop that can take each scopus ID and extract just the abstract, I want to test it with just one entry to make sure the function can provide just one abstract:

#Call just one abstract:
  x = abstract_retrieval("85081719894", identifier = "scopus_id")
  data = jsonlite::fromJSON(httr::content(x$get_statement, as = "text"), flatten=TRUE)

  data = data$`abstracts-retrieval-response`
  names(data)
  data$coredata
  data$coredata$`dc:description`

However, when I use this code, it comes NULL for the abstract information.

> x = abstract_retrieval("85081719894", identifier = "scopus_id")
HTTP specified is:https://api.elsevier.com/content/abstract/scopus_id/85081719894

>   data = jsonlite::fromJSON(httr::content(x$get_statement, as = "text"), flatten=TRUE)
>   
>   data = data$`abstracts-retrieval-response`
>   names(data)
[1] "affiliation" "coredata"   
>   data$coredata
$srctype
[1] "j"

$`prism:issueIdentifier`
[1] "1"

$eid
[1] "2-s2.0-85081719894"

$`pubmed-id`
[1] "32165659"

$`prism:coverDate`
[1] "2020-12-01"

$`prism:aggregationType`
[1] "Journal"

$`prism:url`
[1] "https://api.elsevier.com/content/abstract/scopus_id/85081719894"

$subtypeDescription
[1] "Article"

$`dc:creator`
$`dc:creator`$author
  ce:given-name @seq ce:initials @_fa ce:surname      @auid
1        Wricha    1          W. true      Tyagi 8675093800
                                                    author-url ce:indexed-name
1 https://api.elsevier.com/content/author/author_id/8675093800        Tyagi W.
  preferred-name.ce:given-name preferred-name.ce:initials preferred-name.ce:surname
1                       Wricha                         W.                     Tyagi
  preferred-name.ce:indexed-name affiliation.@id
1                       Tyagi W.       109874624
                                                      affiliation.@href
1 https://api.elsevier.com/content/affiliation/affiliation_id/109874624

$link
  @_fa           @rel
1 true           self
2 true         scopus
3 true scopus-citedby
                                                                                       @href
1                            https://api.elsevier.com/content/abstract/scopus_id/85081719894
2  https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85081719894&origin=inward
3 https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85081719894&origin=inward

$`prism:publicationName`
[1] "Scientific Reports"

$`source-id`
[1] "21100200805"

$`citedby-count`
[1] "0"

$`prism:volume`
[1] "10"

$subtype
[1] "ar"

$`dc:title`
[1] "Root transcriptome reveals efficient cell signaling and energy conservation key to aluminum toxicity tolerance in acidic soil adapted rice genotype"

$openaccess
[1] "1"

$openaccessFlag
[1] "true"

$`prism:doi`
[1] "10.1038/s41598-020-61305-7"

$`prism:issn`
[1] "20452322"

$`article-number`
[1] "4580"

$`dc:identifier`
[1] "SCOPUS_ID:85081719894"

$`dc:publisher`
[1] "Nature Research"

>   data$coredata$`dc:description`
NULL

It should be stated that names(data) only retrieves the fields 'affiliation' and 'coredata'. Is there something that I have missed? I have attached Session_Info_9June2020.txt , for more detail.

sgalla32 commented 4 years ago

Note: I also updated my version of rscopus 0.6.7, but get the same error.

muschellij2 commented 4 years ago

I get them - please check with Elsevier the capabilities of your API key. Please use reprex package for reproducible examples.

library(rscopus)
x = abstract_retrieval("85081719894", identifier = "scopus_id")
#> HTTP specified is:https://api.elsevier.com/content/abstract/scopus_id/85081719894
data = jsonlite::fromJSON(httr::content(x$get_statement, as = "text"), flatten=TRUE)

data = data$`abstracts-retrieval-response`
names(data)
#> [1] "item"          "affiliation"   "coredata"      "idxterms"     
#> [5] "language"      "authkeywords"  "subject-areas" "authors"
data$coredata
#> $srctype
#> [1] "j"
#> 
#> $`prism:issueIdentifier`
#> [1] "1"
#> 
#> $eid
#> [1] "2-s2.0-85081719894"
#> 
#> $`dc:description`
#> [1] "© 2020, The Author(s).Aluminium (Al) toxicity is the single most important contributing factor constraining crop productivity in acidic soils. Hydroponics based screening of three rice genotypes, a tolerant (ARR09, AR), a susceptible (IR 1552, IR) and an acid soil adapted landrace (Theruvii, TH) revealed that AR accumulates less Al and shows minimum decrease in shoot and root biomass under Al toxicity conditions when compared with IR. Transcriptome data generated on roots (grown in presence or absence of Al) led to identification of ~1500 transcripts per genotype with percentage annotation ranging from 21.94% (AR) to 29.94% (TH). A total of 511, 804 and 912 DEGs were identified in genotypes AR, IR and TH, respectively. IR showed upregulation of transcripts involved in exergonic processes. AR appears to conserve energy by downregulating key genes of glycolysis pathway and maintaining transcript levels of key exergonic step enzymes under Al stress. The tolerance in AR appears to be as a result of novel mechanism as none of the reported Al toxicity genes or QTLs overlap with significant DEGs. Components of signal transduction and regulatory machinery like transcripts encoding zinc finger protein, calcieurin binding protein and cell wall associated transcripts are among the highly upregulated DEGs in AR, suggesting increased and better signal transduction in response to Al stress in tolerant rice. Sequencing of NRAT1 and glycine-rich protein A3 revealed distinct haplotype for indica type AR. The newly identified components of Al tolerance will help in designing molecular breeding tools to enhance rice productivity in acidic soils."
#> 
#> $`pubmed-id`
#> [1] "32165659"
#> 
#> $`prism:coverDate`
#> [1] "2020-12-01"
#> 
#> $`prism:aggregationType`
#> [1] "Journal"
#> 
#> $`prism:url`
#> [1] "https://api.elsevier.com/content/abstract/scopus_id/85081719894"
#> 
#> $subtypeDescription
#> [1] "Article"
#> 
#> $`dc:creator`
#> $`dc:creator`$author
#>   ce:given-name @seq ce:initials @_fa ce:surname      @auid
#> 1        Wricha    1          W. true      Tyagi 8675093800
#>                                                     author-url ce:indexed-name
#> 1 https://api.elsevier.com/content/author/author_id/8675093800        Tyagi W.
#>   preferred-name.ce:given-name preferred-name.ce:initials
#> 1                       Wricha                         W.
#>   preferred-name.ce:surname preferred-name.ce:indexed-name affiliation.@id
#> 1                     Tyagi                       Tyagi W.       109874624
#>                                                       affiliation.@href
#> 1 https://api.elsevier.com/content/affiliation/affiliation_id/109874624
#> 
#> 
#> $link
#>   @_fa           @rel
#> 1 true           self
#> 2 true         scopus
#> 3 true scopus-citedby
#>                                                                                        @href
#> 1                            https://api.elsevier.com/content/abstract/scopus_id/85081719894
#> 2  https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85081719894&origin=inward
#> 3 https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85081719894&origin=inward
#> 
#> $`prism:publicationName`
#> [1] "Scientific Reports"
#> 
#> $`source-id`
#> [1] "21100200805"
#> 
#> $`citedby-count`
#> [1] "0"
#> 
#> $`prism:volume`
#> [1] "10"
#> 
#> $subtype
#> [1] "ar"
#> 
#> $`dc:title`
#> [1] "Root transcriptome reveals efficient cell signaling and energy conservation key to aluminum toxicity tolerance in acidic soil adapted rice genotype"
#> 
#> $openaccess
#> [1] "1"
#> 
#> $openaccessFlag
#> [1] "true"
#> 
#> $`prism:doi`
#> [1] "10.1038/s41598-020-61305-7"
#> 
#> $`prism:issn`
#> [1] "20452322"
#> 
#> $`article-number`
#> [1] "4580"
#> 
#> $`dc:identifier`
#> [1] "SCOPUS_ID:85081719894"
#> 
#> $`dc:publisher`
#> [1] "Nature Research"
data$coredata$`dc:description`
#> [1] "© 2020, The Author(s).Aluminium (Al) toxicity is the single most important contributing factor constraining crop productivity in acidic soils. Hydroponics based screening of three rice genotypes, a tolerant (ARR09, AR), a susceptible (IR 1552, IR) and an acid soil adapted landrace (Theruvii, TH) revealed that AR accumulates less Al and shows minimum decrease in shoot and root biomass under Al toxicity conditions when compared with IR. Transcriptome data generated on roots (grown in presence or absence of Al) led to identification of ~1500 transcripts per genotype with percentage annotation ranging from 21.94% (AR) to 29.94% (TH). A total of 511, 804 and 912 DEGs were identified in genotypes AR, IR and TH, respectively. IR showed upregulation of transcripts involved in exergonic processes. AR appears to conserve energy by downregulating key genes of glycolysis pathway and maintaining transcript levels of key exergonic step enzymes under Al stress. The tolerance in AR appears to be as a result of novel mechanism as none of the reported Al toxicity genes or QTLs overlap with significant DEGs. Components of signal transduction and regulatory machinery like transcripts encoding zinc finger protein, calcieurin binding protein and cell wall associated transcripts are among the highly upregulated DEGs in AR, suggesting increased and better signal transduction in response to Al stress in tolerant rice. Sequencing of NRAT1 and glycine-rich protein A3 revealed distinct haplotype for indica type AR. The newly identified components of Al tolerance will help in designing molecular breeding tools to enhance rice productivity in acidic soils."

Created on 2020-06-09 by the reprex package (v0.3.0)

sgalla32 commented 4 years ago

Thank you for the speedy response. I have reached out to Elsevier via email to see if they can provide me more details on my API capabilities.