rdk / p2rank

P2Rank: Protein-ligand binding site prediction tool based on machine learning. Stand-alone command line program / Java library for predicting ligand binding pockets from protein structure.
https://rdk.github.io/p2rank/
MIT License
232 stars 31 forks source link

data input rescore #69

Open ClaraVanM opened 3 months ago

ClaraVanM commented 3 months ago

I get this error when trying to run prank rescore: "Dataset must contain 'protein' and 'prediction' columns!" What is the format that the input needs to have?

rdk commented 3 months ago

You can find exemple dataset files in test_data. In particular: fpocket3.ds and concavity.ds. There is one column (protein) for original structure file and another (prediction) for a file with pocket predictions computed by some other algorithm.

Predictions of which binding site prediction tool are you trying to rescore? Fpocket, Concavity or something else entirely?

ClaraVanM commented 3 months ago

fpocket and ghecom, thank you for the fast response.

rdk commented 3 months ago

fpocket should work, although I haven't tested it with 4.x versions. Are you using the latest version of fpocket? If you try it please let me know if it works. In case there is any problem with rescoring fpocket 4.x predictions I will implement the fix.

Support for rescoring ghecom was never implemented. However, I was thinking about imlementing a universal input format for rescoring: a simple csv file with pocket centers. Would it help you?

ClaraVanM commented 3 months ago

I used fpocket 4.1.4, I get this error:

P2Rank 2.5.0-dev.3 [INFO] Console - P2Rank 2.5.0-dev.3

[INFO] Console - [INFO] Main - loading default config from [/vsc-hard-mounts/leuven-data/351/vsc35111/thesis/P2Rank/p2rank/distro/config/default.groovy] [INFO] Main - loading default config from [/vsc-hard-mounts/leuven-data/351/vsc35111/thesis/P2Rank/p2rank/distro/config/default_rescore.groovy] [INFO] Main - looking for dataset in dataset_base_dir [/vsc-hard-mounts/leuven-data/351/vsc35111/thesis/P2Rank/p2rank/fpocket_prank.ds]... [INFO] Dataset - loading dataset [/vsc-hard-mounts/leuven-data/351/vsc35111/thesis/P2Rank/p2rank/fpocket_prank.ds] [INFO] Futils - deleting /vsc-hard-mounts/leuven-data/351/vsc35111/thesis/P2Rank/p2rank/distro/test_output/rescore_fpocket_prank/run.log rescoring pockets on proteins from dataset [fpocket_prank.ds] [INFO] Console - rescoring pockets on proteins from dataset [fpocket_prank.ds] [INFO] RescorePocketsRoutine - outdir: /vsc-hard-mounts/leuven-data/351/vsc35111/thesis/P2Rank/p2rank/distro/test_output/rescore_fpocket_prank [INFO] FeatureSetup - enabledFeatures: [chem, volsite, protrusion, bfactor, atom_table] [INFO] Dataset - processing dataset [fpocket_prank.ds] using 0 threads [INFO] Dataset -

processing [1a28_deposited_refined_prot_out.pdb] (1/1)

processing [1a28_deposited_refined_prot_out.pdb] (1/1) [INFO] Console - processing [1a28_deposited_refined_prot_out.pdb] (1/1) [INFO] Protein - loading protein [/data/leuven/351/vsc35111/thesis/proteins/1a28_deposited_refined_prot.pdb] [INFO] PdbUtils - loading file [/data/leuven/351/vsc35111/thesis/proteins/1a28_deposited_refined_prot.pdb] [INFO] Struct - groups in chain A: 251 [INFO] Struct - groups in chain B: 249 [INFO] Struct - groups in chain A: 251 [INFO] Struct - 251 groups in chain A [INFO] Struct - groups in chain B: 249 [INFO] Struct - 249 groups in chain B [INFO] Protein - structure atoms: 4216 [INFO] Protein - protein atoms: 3972 [INFO] Protein - loading ligands [INFO] Ligands - loading 0 ligands [INFO] Ligands - loading 0 ligands [INFO] Ligands - Loaded 0 relevant ligands: [] [ERROR] Dataset - error processing dataset item [1a28_deposited_refined_prot_out.pdb] java.lang.NullPointerException: null at cz.siret.prank.domain.loaders.DatasetItemLoader.loadPredictionPair(DatasetItemLoader.groovy:62) ~[p2rank.jar:?] at cz.siret.prank.domain.Dataset$Item.loadPredictionPair(Dataset.groovy:841) ~[p2rank.jar:?] at cz.siret.prank.domain.Dataset$Item.getPredictionPair(Dataset.groovy:823) ~[p2rank.jar:?] at cz.siret.prank.program.routines.predict.RescorePocketsRoutine$_execute_closure1.doCall(RescorePocketsRoutine.groovy:63) ~[p2rank.jar:?] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?] at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343) ~[groovy-4.0.18.jar:4.0.18] at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328) ~[groovy-4.0.18.jar:4.0.18] at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:279) ~[groovy-4.0.18.jar:4.0.18] at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007) ~[groovy-4.0.18.jar:4.0.18] at groovy.lang.Closure.call(Closure.java:433) ~[groovy-4.0.18.jar:4.0.18] at groovy.lang.Closure.call(Closure.java:422) ~[groovy-4.0.18.jar:4.0.18] at cz.siret.prank.domain.Dataset$1.processItem(Dataset.groovy:145) ~[p2rank.jar:?] at cz.siret.prank.domain.Dataset.processssItem(Dataset.groovy:228) [p2rank.jar:?] at cz.siret.prank.domain.Dataset.access$0(Dataset.groovy) [p2rank.jar:?] at cz.siret.prank.domain.Dataset$2.call(Dataset.groovy:192) [p2rank.jar:?] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.base/java.lang.Thread.run(Thread.java:829) [?:?] rescoring finished in 0 hours 0 minutes 3.441 seconds

Yes i think that can be usefull! But then I would need to calculate pocket centers from the ghecom output, as they are not given in the output.

rdk commented 3 months ago

Looks like there is a bug that needs to be fixed. Could you attach fpocket output files that are giving you the error, at least for 1a28?

But then I would need to calculate pocket centers from the ghecom output, as they are not given in the output.

Yes exactly. How are you generating ghecom predictions? Using web server or locally?

ClaraVanM commented 3 months ago

1a28_deposited_refined_prot_out_pdb.txt I changed the name of the file, since github would not let me upload the .pdb version. i am running it locally.

rdk commented 1 week ago

@ClaraVanM sorry for the delay.

If it is still relevant, you can try if the release 2.4.2 (https://github.com/rdk/p2rank/releases/tag/2.4.2) works in your case. It containes updated support for loading fpocket predictions. Also please check if your dataset file has the right format (example: https://github.com/rdk/p2rank/blob/develop/distro/test_data/fpocket.ds).