ifyoungnet / ADMETlab

A platform for systematic ADME evaluation of drug molecules, thereby accelerating the drug discovery process.
GNU Affero General Public License v3.0
59 stars 27 forks source link

Which Python 2 version, skikit-learn, pickle version? #7

Closed misterbrandonwalker closed 3 years ago

misterbrandonwalker commented 3 years ago

Can someone post all python package versions? I am having issues with unpickling the data, I suspect new/older version incompatibility. It would be nice if these could be pickled into python 3 format, python 2 starts to loose support eventually.

misterbrandonwalker commented 3 years ago

Interesting I am able to use numpy.load(pklfile,allow_pickle=True) on your CYP1A2 pkl file, but not the pkl file in your example folder. Also, I notice the bit vector for example is 1024, it seems the CYP1A2 requires 2048, not an issue I think in terms of telling descriptor package to use that size, however, I guess the only way I find out is by looking at the shape of each of your classifying models individually. Also, I am wondering, how to tell which descriptors for each classifier model? I read the page in listed in documentation, it just lists overall all the types of descriptors used, not which ones for each model (unless I am missing something).

misterbrandonwalker commented 3 years ago

It also looks like the extra .npy files are related to output from each NN layer? input->->->output?

ifyoungnet commented 3 years ago

@bdw2292 Dear sir,

Thank you for your feedback!

  1. We have only models of py2 now.

  2. Actually in the documentation (http://admet.scbdd.com/home/modeling_process/), we listed overall all the types of descriptors used for each model. Please see again.

  3. ​A good news is that ADMETlab 2.0 is now online with more powerful functions like batch computation.

http://admet.scbdd.com/

https://admetmesh.scbdd.com/

misterbrandonwalker commented 3 years ago

Ah okay so this line "The fingerprint descriptor includes FP2, MACCS, ECFP2 , ECFP4, ECFP6", means we can choose which one to use as input. The NN can handle any of them basically. This was what I was confused about.

Thanks! I am more interested in the python models, rather than the server :)

On Wed, Mar 17, 2021 at 4:36 AM Jie Dong @.***> wrote:

@bdw2292 https://github.com/bdw2292 Dear sir,

Thank you for your feedback!

1.

We have only models of py2 now. 2.

Actually in the documentation ( http://admet.scbdd.com/home/modeling_process/), we listed overall all the types of descriptors used for each model. Please see again. 3.

​A good news is that ADMETlab 2.0 is now online with more powerful functions like batch computation.

http://admet.scbdd.com/http://admet.scbdd.com/%E2%80%8B

https://admetmesh.scbdd.com/https://admetmesh.scbdd.com/%E2%80%8B

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ifyoungnet/ADMETlab/issues/7#issuecomment-800937971, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKNB26KU52QBD5RT7IURIC3TEBZY5ANCNFSM4ZFOTVDA .

ifyoungnet commented 3 years ago

@bdw2292

Plz, I have to repeat that. Please read the instructions carefully and drag the mouse to each specific model introduction. You will find that we have specified the best method and corresponding descriptor for each endpoint. If I remember correctly, there is a label folder in the unzipped files. After reading, you can get the descriptors used by each model. thank you!

misterbrandonwalker commented 3 years ago

I hope you will agree with me, just from looking at your one example in the example folder. It really is not clear at all.

Here is an example from Table 25 at http://admet.scbdd.com/home/modeling_process/#_Toc469587644 .

nsulph, VSAEstate8, QNmin, IDET, ndb, slogPVSA2, MATSv5, S32, QCss, bcutm4, S9, bcutp8, Tnc, nsb, Geto, bcutp11, S7, MATSm2, GMTIV, nhet, MATSe1, CIC0, bcutp3, Gravto, EstateVSA9, MATSe3, MATSe5, UI, S53, J, bcute1, MRVSA9, PEOEVSA0, MATSv2, IDE, AWeight, IC0, S16, bcutp1, PEOEVSA12

I notice in your example, there is an input csv file, with only one fingerprint/descriptor. That is ECFP4, and I see how that is listed in your table as the feature for CYP3A4 on your website, however for the above example, what is the protocol? Compute all descriptors then put them together in an array (is there a different bit vector length requirement for different NN's (as I noticed in one of first questions). Perhaps it would be nice to see an example for one computed with many descriptors, as the example given has only one fingerprint being used. I hope you can see how this can lead people to be confused as to what are the exact steps for reproducing.

On Wed, Mar 17, 2021 at 11:38 PM Jie Dong @.***> wrote:

@bdw2292 https://github.com/bdw2292

Plz, I have to repeat that. Please read the instructions carefully and drag the mouse to each specific model introduction. You will find that we have specified the best method and corresponding descriptor for each endpoint. If I remember correctly, there is a label folder in the unzipped files. After reading, you can get the descriptors used by each model. thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ifyoungnet/ADMETlab/issues/7#issuecomment-801617576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKNB26M4PQWI7F6ZOJVYRTTTEF7SPANCNFSM4ZFOTVDA .

misterbrandonwalker commented 3 years ago

For example, what happens if we use all these descriptors above, but put them in the wrong order? I expect the NN requires a specific order for many descriptors?

On Thu, Mar 18, 2021 at 9:53 AM Brandon D Walker @.***> wrote:

I hope you will agree with me, just from looking at your one example in the example folder. It really is not clear at all.

Here is an example from Table 25 at http://admet.scbdd.com/home/modeling_process/#_Toc469587644 .

nsulph, VSAEstate8, QNmin, IDET, ndb, slogPVSA2, MATSv5, S32, QCss, bcutm4, S9, bcutp8, Tnc, nsb, Geto, bcutp11, S7, MATSm2, GMTIV, nhet, MATSe1, CIC0, bcutp3, Gravto, EstateVSA9, MATSe3, MATSe5, UI, S53, J, bcute1, MRVSA9, PEOEVSA0, MATSv2, IDE, AWeight, IC0, S16, bcutp1, PEOEVSA12

I notice in your example, there is an input csv file, with only one fingerprint/descriptor. That is ECFP4, and I see how that is listed in your table as the feature for CYP3A4 on your website, however for the above example, what is the protocol? Compute all descriptors then put them together in an array (is there a different bit vector length requirement for different NN's (as I noticed in one of first questions). Perhaps it would be nice to see an example for one computed with many descriptors, as the example given has only one fingerprint being used. I hope you can see how this can lead people to be confused as to what are the exact steps for reproducing.

On Wed, Mar 17, 2021 at 11:38 PM Jie Dong @.***> wrote:

@bdw2292 https://github.com/bdw2292

Plz, I have to repeat that. Please read the instructions carefully and drag the mouse to each specific model introduction. You will find that we have specified the best method and corresponding descriptor for each endpoint. If I remember correctly, there is a label folder in the unzipped files. After reading, you can get the descriptors used by each model. thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ifyoungnet/ADMETlab/issues/7#issuecomment-801617576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKNB26M4PQWI7F6ZOJVYRTTTEF7SPANCNFSM4ZFOTVDA .

ifyoungnet commented 3 years ago

@bdw2292 nsulph, VSAEstate8, QNmin, IDET, ndb, slogPVSA2, MATSv5, S32, QCss, bcutm4, S9, bcutp8, Tnc, nsb, Geto, bcutp11, S7, MATSm2, GMTIV, nhet, MATSe1, CIC0, bcutp3, Gravto, EstateVSA9, MATSe3, MATSe5, UI, S53, J, bcute1, MRVSA9, PEOEVSA0, MATSv2, IDE, AWeight, IC0, S16, bcutp1, PEOEVSA12

Your are right, the order is must be declared,and the above is the order. We have provided a lable.pkl for each model, and it is a list object of python, which included the order information. Fingerprints do not need order, just use the tool to calculate.

ADMETlab is mainly for online prediction and this It's a big project that's been developed by a team . I can only try to reply to what I know.

Thank you!

misterbrandonwalker commented 3 years ago

Ah thank you very much, I now see a Label folder under regression model!

On Thu, Mar 18, 2021 at 11:24 AM Jie Dong @.***> wrote:

@bdw2292 https://github.com/bdw2292 nsulph, VSAEstate8, QNmin, IDET, ndb, slogPVSA2, MATSv5, S32, QCss, bcutm4, S9, bcutp8, Tnc, nsb, Geto, bcutp11, S7, MATSm2, GMTIV, nhet, MATSe1, CIC0, bcutp3, Gravto, EstateVSA9, MATSe3, MATSe5, UI, S53, J, bcute1, MRVSA9, PEOEVSA0, MATSv2, IDE, AWeight, IC0, S16, bcutp1, PEOEVSA12

Your are right, the order is must be declared,and the above is the order. We have provided a lable.pkl for each model, and it is a list object of python, which included the order information. Fingerprints do not need order, just use the tool to calculate.

ADMETlab is mainly for online prediction and this It's a big project that's been developed by a team . I can only try to reply to what I know.

Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ifyoungnet/ADMETlab/issues/7#issuecomment-802065842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKNB26J3RP2T6PWXKZMNGKDTEISLVANCNFSM4ZFOTVDA .