Question about data process

SharplessSword commented 2 years ago

hello, i want to know how to use ./POINTNET, it can't be recognize when i type it in the POWER SHELL(Windows) and Terminal(Linux), or could you give me some details about the algorithm you use to select the 1024/2048 point , i could write a scrpit in Python to process the raw_data?

Another Question, merely out of curiosity。Why you specific use openbabel to add polar hydrogen, pyMOL maybe can do the it also?

wyji001 commented 2 years ago

To generate point cloud, you can use the follow code(In Linux)(all input file need pdb format).

./POINTNET ./protein_path ./ligand_path ./pointcloud_outfile_path

The detail about the algorithm is introduce in method. The algorithm is simple, collect all ligand atoms and the (1024/2048 - the number of ligand atoms) closest atoms to the ligand.

"Before incorporating the point clouds for training, we aligned the coordinates of protein–ligand complex to the center of ligand, which ensured that the model was not affected by translation.We collected all the atoms of ligand and then the closet atoms to ligand’s center from the corresponding protein, resulting in a total of 1024 points. To simplify computations, we only considered the atoms of protein or ligand as independent points separately, whereas eliminated the covalent-bond relationships in proteins or ligands. Each atom is represented by a single point consisting of six types of information, including x, y and z coordinates, van der Waals radius, atomic weight and their sources (1 for proteins and −1 for ligands). Atomic coordinates were normalized by distance of the atom farthest from the center of ligand. All other parameters, including radius and atomic weight, were also normalized. If the total number of atoms is fewer than 1024, additional points with all parameters set to zero were created to compensate."

We use openbabel for preprocessing doesn't mean it cannot be processed with other software.

wyji001 commented 2 years ago

We do not recommend considering 2048 points in the pdbbind data. 2048 points are too many for protein ligand affinity prediction.

SharplessSword commented 2 years ago

Thanks for your reply, you mean you parse the ligand file to .pdb firstly?
I also have a question about the pocket. You know ,the dataset also seems have the pdb file about the pocket. And in your article , you juse use the protein and the ligand, ignore the pocket information(Maybe i missed it). You mentioned the concept "the atom that has the greatest effect on the prediction outcome" , is there some similarity with the pocket ? forgive my ignorance, i just have a spark about use the pocket information.

wyji001 commented 2 years ago

We sample the atoms closest to the ligand center and do not focus on pocket information. "the atom that has the greatest effect on the prediction outcome" is the result of network training, we do not include any prior information。

SharplessSword commented 2 years ago

Thanks again ! Although i'm still struggle with data_preprocess. I want to know something about normalization. How to normalize the van-der-waals-radius and atom-weight . In the exampler 5c2h_11.09 , all van-der-waals-radius are positive and all atom-weight are negative .

wyji001 / Point-Cloud

Question about data process #3