nanxstats / Rcpi

💊 Molecular informatics toolkit with integration of bioinformatics and cheminformatics tools for drug discovery
https://nanx.me/Rcpi/
Artistic License 2.0
35 stars 12 forks source link

Returns empty string when using getSmiFromPubChem #8

Closed futsuunohito closed 4 years ago

futsuunohito commented 4 years ago

Hi,

the function getSmiFromPubChem returns an empty string ("") when the id is a singular string, but when the id is a list of string, it works.

nanxstats commented 4 years ago

Hmm, that's probably related to how it was implemented. Usually one would expect to get SMILES for a batch of IDs in a single call, instead of feeding a single ID and looping over.

It's true that this edge case needs a fix though.

futsuunohito commented 4 years ago

in my case, I used the function on a data frame iteration, in which I used it per iteration. I did this because the dataset consists of multiple sources (drugbank, kegg, etc.). Would you mind giving it a fix? Thanks in advance

futsuunohito commented 4 years ago

@nanxstats I find another bug in getSmiFromDrugBank. Some of the drugbank ids doesn't have an SDF file that is available to be accessed. For example, the id "DB00729" when accessed through "https://www.drugbank.ca/drugs/DB00729", it redirected to "https://www.drugbank.ca/salts/DBSALT002847" instead. Which means although the ID has correctly redirected the user the actual page, your code failed to access its SDF at "https://www.drugbank.ca/structures/small_molecule_drugs/DB00729.sdf" because it does not exist.

Please correct me if I wrong, thanks

nanxstats commented 4 years ago

-ok, I will look into them.

nanxstats commented 4 years ago

Hi @futsuunohito -the first single ID edge case has been fixed. You can install the GitHub version and here's an example:

library("Rcpi")
id = "7847562"
getSmiFromPubChem(id)

I also looked into the the second "bug" but it seems any "fix" will make the logic more complicated than it should be --- as the ID of salts (product ingredients) should be converted to the actual drug (active moiety) ID locally first. For the above example, DB00729 is not an active drug ID anymore, and DBSALT002847 should point to DB13720. It is much easier to do this on the user's end. Other packages dedicated to DrugBank data access such as drugbankR or dbparser might be useful for such purposes.