ropensci / webchem

Chemical Information from the Web
https://docs.ropensci.org/webchem
Other
160 stars 40 forks source link

Error in ci_query with best match for some chemicals #265

Closed raiphilibert closed 4 years ago

raiphilibert commented 4 years ago

Hello

I have been having an issue downloading data using the ci_query function with the match option ''best". This error only happens for some chemicals (mostly metals e.g Barium, Aluminum).

 ci_query("Barium",match = "best")

returns an error:

https://chem.nlm.nih.gov/chemidplus/name/startswith/Barium?DT_START_ROW=0&DT_ROWS_PER_PAGE=50
More then one Link found. 

Returning best match. 

https://chem.nlm.nih.gov/chemidplus/rn/1304-28-5
Error in UseMethod("xml_find_first") : 
  no applicable method for 'xml_find_first' applied to an object of class "try-error"

But if I run the ci_query and manually select the same match, the download is successful.

ci_query("Barium",match = "ask")
https://chem.nlm.nih.gov/chemidplus/name/startswith/Barium?DT_START_ROW=0&DT_ROWS_PER_PAGE=50
More then one Link found. 

                                                                                       name       cas
15                                           Octadecanoic acid, barium cadmium salt (4:1:1) 1191-79-3
16                                                     Barium bis(hydroxybenzenesulphonate) 1300-37-4
17                                                          Barium anilinobenzenesulphonate 1300-92-1
18                                                                             Barium oxide 1304-28-5

Enter rownumber of compounds (other inputs will return 'NA'):
1: 18

https://chem.nlm.nih.gov/chemidplus/rn/1304-28-5
$Barium
$name
 [1] "Barium monoxide"          "Barium oxide"             "Barium protoxide"        
 [4] "Baryta"                   "Calcined baryta"          "EC 215-127-9"            
 [7] "EINECS 215-127-9"         "Oxyde de baryum"          "Oxyde de baryum [French]"
[10] "UNII-77603K202B"         

$synonyms
 [1] "Barium monoxide"          "Barium oxide"             "Barium protoxide"        
 [4] "Baryta"                   "Calcined baryta"          "EC 215-127-9"            
 [7] "EINECS 215-127-9"         "Oxyde de baryum"          "Oxyde de baryum [French]"
[10] "UNII-77603K202B"         

$cas
[1] "1304-28-5"

$inchi
[1] "InChI=1S/Ba.O"

$inchikey
[1] "QVQLCTNNEUAWMS-UHFFFAOYSA-N"

$smiles
[1] "O=[Ba]"

$toxicity
  Organism Test Type           Route Reported Dose (Normalized Dose) Effect
1    mouse      LD50 intraperitoneal             146mg/kg (146mg/kg)     NA
2    mouse      LD50    subcutaneous               50mg/kg (50mg/kg)     NA
                                                                                                                                                    Source
1                                                                                                               Current Toxicology.  Vol. 1, Pg. 39, 1993.
2 Zhurnal Vsesoyuznogo Khimicheskogo Obshchestva im. D.I. Mendeleeva.  Journal of the D.I. Mendeleeva All-Union Chemical Society.  Vol. 19, Pg. 186, 1974.

$physprop
[1] NA

$source_url
[1] "https://chem.nlm.nih.gov/chemidplus/rn/1304-28-5"

attr(,"matched")
[1] "Barium oxide"
attr(,"distance")
[1] "interactive"
attr(,"class")
[1] "chemid"

attr(,"class")
[1] "ci_query" "list" 
stitam commented 4 years ago

Thanks @raiphilibert for opening this issue!

Unfortunately I could not replicate the error (tried both barium and aluminium). CAS 1304-28-5 was queried internally in both cases, so the function works till that point. However, querying the exact same CAS failed in one case, whereas it didn't fail in another. This is strange because the function should execute the same piece of code. The error was probably produced by xml2::read_html(), maybe this is a curl issue?

Aariq commented 4 years ago

@raiphilibert are you using the most recent version of webchem, version 1.0.0? I also can't reproduce this error on my system, but I remember having seen this xml_find_first() error somewhere...

raiphilibert commented 4 years ago

Hi

I do think it's the latest version. I installed it very recently. I'll try again over the next couple of days and let you know!

Thanks!

On Fri., Jun. 19, 2020, 8:30 a.m. Eric R Scott, notifications@github.com wrote:

@raiphilibert https://github.com/raiphilibert are you using the most recent version of webchem, version 1.0.0? I also can't reproduce this error on my system, but I remember having seen this xml_find_first() error somewhere...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/webchem/issues/265#issuecomment-646698951, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC6Q4HBCM7RCTSMHR74CYADRXOAC3ANCNFSM4OA2KD5Q .

raiphilibert commented 4 years ago

I tried again today and it seemed to work fine. I'll close the issue for now and will let you know if it comes up again