Open Serofah opened 4 years ago
Indeed, there is some filtering that is made in the NP-likeness scorer that can filter molecules out at different steps:
I can help with the identification of the problem if you provide me with the molecules that worked and didn't work
In the attachment there is test-file with 14 related molecules. Only 1 worked.
Would be great to know what I do wrong =) .
Did you had a change to look at the dataset?
I would really like some feedback so I can use the calculator for my research.
Thanks!
Hi, sorry about this, with the current situation I had a lot on my plate and totally forgot to answer.
I checked the file you provided me, and in there, indeed, only one molecule is read by the scorer. This comes from the exact format of the SDF file that you are using that is not recognized by the SDFreader from the CDK, as it is not canonical.
What are you using to generate the SDF file? Can you use instead of it a SMILES file?
I used ChemDraw to draw the scaffold and DataWarrior for enummeration and to generate the SDF file. I believe this is the extended SDF file (V3000), but I am not an expert.
I loaded this files back into ChemDraw and saved as "standard" SDF and also as Mol file. Unfortunatly these still didn't work..
However, the "standard" SDF I could convert into a SMILES file via a online converter (https://cactus.nci.nih.gov/translate/). This file worked well with the calculator!
Do you have any suggestions on how to generate SMILES files more straightforward? (I am a bit afraid of how ChemDraw is going to react when I am going to load 10k structures at once..)
Thanks!
I don't know DataWarrior, the format problem might come from here. Extended SDF files, if they are canonical are also accepted, so the problem comes from the non-canonicity of the generated SDF. To translate SDF to SMILES I generally use OpenBabel (https://openbabel.org/docs/dev/Command-line_tools/babel.html)
Hope this helps!
Thanks. I will check it out.
I just tried loading 6k structures in ChemDraw. Was not a succes..
Hi,
I experience some issues with uploading molecular files (SDF). I tested different files, but only a part of the content is used to calculate NP-scores.
6000 compounds → 45 NP-scores (performed on locally hosted website) 180 compound → 119 NP-scores visible (performed on naples.naturalproducts.net) 14 compound → 1 NP-score visible (performed on naples.naturalproducts.net) 4 compounds → 3 NP scores visible (performed on naples.naturalproducts.net)
The files are created all in the same manner using DataWorrior.