rlabduke / probe

Evaluate and visualize protein interatomic packing
http://kinemage.biochem.duke.edu/software/probe.php
10 stars 4 forks source link

Probe commandline converts chain ids to capitals in selection syntax #13

Open chrissciwilliams opened 1 year ago

chrissciwilliams commented 1 year ago

The (legacy) Probe commandline allows selection of types of contact partners. Whatever parses this is converting parts of the selection string to capital letters. This makes it impossible to select case-sensitive chains containing lower-case letters. (I have not tested the Probe2 commandline.)

For example: phenix.probe -u -q -con -mc -het -once -ONLYBADOUT 'chainb ogt10 not water' 'ogt10' 8b0x_B_and_b_chains.pdb This command should show only the chain b (lowercase) contacts for the attached file. Instead it shows only the contacts for chain B (uppercase).

Compare to: phenix.probe -u -q -con -mc -het -once -ONLYBADOUT 'chainB ogt10 not water' 'ogt10' 8b0x_B_and_b_chains.pdb which selects for the chain B (uppercase) contacts.

Also compare to: phenix.probe -u -q -con -mc -het -once -ONLYBADOUT 'ogt10 not water' 'ogt10' 8b0x_B_and_b_chains.pdb which does not specify a chain, and shows the contacts for both B and b. This shows that Probe is successfully case-sensitive internally, and the issue is likely localized to the selection syntax.

We would like to support lowercase characters for chains (and alternates?). On the MolProbity site, this affects the interface contacts tool. In personal work, this affects some of my database construction scripts.

Sample file containing B and b chains (zipped b/c GitHub doesn't permit .pdb files): 8b0x_B_and_b_chains.zip

russell-taylor commented 1 year ago

Running Probe2 using CCTBX selection language does allow for case-sensitive selection, as verified using the following command and visually inspecting the dots vs. the model file:

mmtbx.probe2 D:\data\Richardsons\8b0x_B_and_b_chains.pdb approach=once output.file_name=8b0x.kin output.add_kinemage_keyword=True source_selection="(chain b not water)" target_selection="(chain b not water)"

The Reduce command line that produces visual output for the same selection as the original probe (using chain B rather than b) is: mmtbx.probe2 D:\data\Richardsons\8b0x_B_and_b_chains.pdb approach=once output.file_name=8b0xB.kin output.add_kinemage_keyword=True source_selection="(chain B and occupancy > 0.1 and not water)" target_selection="(occupancy > 0.1)" output.overwrite=True only_report_bad_clashes=True

The command that does the same thing but for chain b: mmtbx.probe2 D:\data\Richardsons\8b0x_B_and_b_chains.pdb approach=once output.file_name=8b0x.kin output.add_kinemage_keyword=True source_selection="(chain b and occupancy > 0.1 and not water)" target_selection="(occupancy > 0.1)" output.overwrite=True only_report_bad_clashes=True

The Probe2 command line corresponding to the desired behavior is: mmtbx.probe2 D:\data\Richardsons\8b0x_B_and_b_chains.pdb approach=once output.file_name=8b0x.txt source_selection="(chain b and occupancy > 0.1 and not water)" target_selection="(occupancy > 0.1)" output.overwrite=True only_report_bad_clashes=True output.format=raw output.condensed=True

russell-taylor commented 1 year ago

srcArg in probe.c:359 is all lowercase (as expected, given it was set that way on the command line).

The parse.c idItem() function looks for CHAIN and then inserts the following character as uppercase B. (parse.c:211+).

This looks like it is because the call to parse.c:lexan() calls toupper() on every character on the input string on the way in, presumably to make token comparisons easier.

russell-taylor commented 1 year ago

One approach might be to remove all of the toupper() calls and replace the strncmp() calls with strncasecmp() and the strcmp() with strcasecmp(). Care would need to be taken to then put back in case sensitivity just where it was needed, possible in lookup() but this may need to be somehow scoped to only the chain lookups. It looks like this may be a mess because lookup() is used all over the place to match tokens for different kinds of objects. Yeah, the token code would get somewhat scrambled by mixing upper and lower case; using the casecmp in it will keep behaving the same way as the current upper-case approach but it is not so clear how to make it different only for chains -- the b and B would appear the same to lookup so no telling which one you'd get.

chrissciwilliams commented 1 year ago

Supporting case-sensitive selection in Probe2 is the most important thing, so I'm glad that works as expected.

-Christopher Williams ---Richardson Lab, Duke University

On Tue, Mar 28, 2023 at 11:55 AM Russell Taylor @.***> wrote:

Running Probe2 using CCTBX selection language does allow for case-sensitive selection, as verified using the following command:

mmtbx.probe2 D:\data\Richardsons\8b0x_B_and_b_chains.pdb approach=once output.file_name=8b0x.kin output.add_kinemage_keyword=True source_selection="(chain b not water)" target_selection="(chain b not water)"

— Reply to this email directly, view it on GitHub https://github.com/rlabduke/probe/issues/13#issuecomment-1487186268, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLERECL3VLFYGMPXMTLUBLW6MCWTANCNFSM6AAAAAAWIKQMQA . You are receiving this because you authored the thread.Message ID: @.***>