help with data pre-processing

celalp commented 1 year ago

Hi,

I'm currently just interested in getting some reproducible protein surface representations and I was wondering what the order of processing of pdb files are. This is not too clear on the git repo starting from a pdb file.

Seems like the order of pre-processing is:

-- in data_preprocessing protonate pdb, get the protein chains we are interested in convert to npy

After that I am assuming that the outputs go to geometry_preprocessing but not sure about what is going on. I'm not currently interested in training any kind of model but just getting a good representation of the protein surface (shape and chemical properties).

Thanks,

celalp commented 1 year ago

I'm sorry wrong repo

chrissciwilliams commented 12 months ago

Hi Alper,

Hope you find the right place to ask your question.

Generally, we do recommend adding hydrogens before splitting into individual chains (this agrees with your proposed workflow.) Certain hydrogen positions and protonation states (including Asn, Gln, and especially His) are determined based on contact networks that may extend across interfaces. That contact network may be different for an isolated chain than for a chain in a complex.

We also recommend doing an NQH flip analysis (Reduce will do this for you, given the -flip flag). It's an easy fix for some common contact errors. A quick guide to flips can be found here: http://molprobity.biochem.duke.edu/help/reduce_guide/reduce_guide.html

Good luck, -Christopher Williams ---Richardson Lab, Duke University

On Mon, Nov 6, 2023 at 10:25 AM Alper Celik @.***> wrote:

I'm sorry wrong repo

— Reply to this email directly, view it on GitHub https://github.com/rlabduke/reduce/issues/53#issuecomment-1795346388, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLERECE4ER5LFSDHTY6KILYDEFQNAVCNFSM6AAAAAA67Y5NKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJVGM2DMMZYHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rlabduke / reduce

help with data pre-processing #53