ropensci / webchem

Chemical Information from the Web
https://docs.ropensci.org/webchem
Other
161 stars 40 forks source link

alanwood() multiplies entries in the case of isomer mixtures #39

Closed jranke closed 9 years ago

jranke commented 9 years ago

For example in the case of S-Metolachlor

http://www.alanwood.net/pesticides/s-metolachlor.html

the HTML snippet for the InChiKey is

<tr valign="baseline">
<th id="r11">InChIKey:</th>
<td headers="r11">major component (<i>S</i>)-isomer:<br>WVQBLGZPHOPPFO-LBPRGKRZSA-N<br>minor component (<i>R</i>)-isomer:<br>WVQBLGZPHOPPFO-GFCCVEGCSA-N</td>
</tr>

and we get

test <- webchem::alanwood("S-Metolachlor", type = "commonname")
## Querying s-metolachlor.htmls-metolachlor.html
test$inchikey
## [1] "major component (S)-isomer:WVQBLGZPHOPPFO-LBPRGKRZSA-Nminor component (R)-isomer:WVQBLGZPHOPPFO-GFCCVEGCSA-N"
## [2] "major component (S)-isomer:WVQBLGZPHOPPFO-LBPRGKRZSA-Nminor component (R)-isomer:WVQBLGZPHOPPFO-GFCCVEGCSA-N"

which is duplicated (already the query as it seems).

eduardszoecs commented 9 years ago

Thanks for reporting this! Isomers, are always not easy to handle... Will investigate....

eduardszoecs commented 9 years ago

This is a case when there are multiple links on the webpage. Should be fixed now, Please check/use/test the current version on the master branch (next cran release is planned for next year):

install.packages("devtools")
library("devtools")
install_github("ropensci/webchem")

With the new version we get:

> require(webchem)
> alanwood("S-Metolachlor", type = "commonname")$inchikey

# More then one link found! Returning first.
# 
# Querying s-metolachlor.html
# [1] "major component (S)-isomer:WVQBLGZPHOPPFO-LBPRGKRZSA-Nminor component (R)-isomer:WVQBLGZPHOPPFO-GFCCVEGCSA-N"
jranke commented 9 years ago

@EDiLD I confirm this is better. What would you think about returning a vector of two INCHI keys, named with the isomer description ("major component (S)-isomer")?

eduardszoecs commented 9 years ago

Good idea, I opened a new issue #42.