schrodinger / pymol-open-source

Open-source foundation of the user-sponsored PyMOL molecular visualization system.
https://pymol.org/
Other
1.13k stars 270 forks source link

Support path separators in PYMOL_PATH (and derived variables) #377

Open netogallo opened 1 month ago

netogallo commented 1 month ago

Motivation

I wanted to automatically add more plugin paths to pymol without relying on the existence of a "pymolpluginrc" file. So I proceeded to add more directories to the PYMOL_PATH (in such a way that the subdirectory data/startup exits in it). To my surprise, instead of the expected result (plugins in that directory get loaded at startup), the Pymol user interface got messed up and no plugins (even the pre-bundled) ones were loaded.

Doing some inspection, we can see that modules/pymol/init.py:setup_environ has no regard for the path separator. Furthermore, usage of derived variables, such as PYMOL_DATA in modules/pymol/plugins/init.py, continue disregarding the path separators.

Doing a quick search, it is evident that this type of disregard is prevalent over the entire codebase. Lookup of resoruces, textures and more continue to disregard the path separator.

Proposed Solution

To properly address this issue, all these lockups should ideally be abstracted away in a class that takes care of handling of variabels with path separators. To be concrete, consider the code in modules/pmg_qt/pymol_qt_gui.py which loads and applies the default stylesheet (which is no longer found if PYMOL_PATH has path separators):

Rather than doing:

with open(cmd.exp_path('$PYMOL_DATA/pmg_qt/styles/pymol.sty')) as f:
                style = f.read()

It would be nice having a path_manager such that we can do:

from somewhere import path_manager
with path_manager.open_path_resource("PYMOL_DATA", "pmg_qt/styles/pymol.sty") as f:
    style = f.read()

In that case, the path manager will assemble multiple potential locations for "pymol.sty" based on the various paths that exist in the PYMOL_DATA variable (which is by default derived from PYMOL_PATH) and simply open the first match of the file. In a similar fashion, different apis can be added for different operations. Going back to my original requests for plugins. Once could have code like:

for plugin_directory in path_manager.iterate_paths("PYMOL_DATA", "startup"):
    load_plugins(plugin_directory)

Implementation Plan

Given that disregards for path variables happen all over the codebase (potentially even in C++), it would take a lot of time and effort to fix every instance. Ideally, an incremental approach should be adopted so the codebase can be gradually modified such that some places adopts the new behavior while other places fallback to the current behavior. Assuming the "path_manager" is implemented, there are to possibilities to use it without breaking existing code:

  1. Initialize the path_manager using environmental variables that cointain path separators, afterwards, update said environmental variables to remove all directories except the latest one. That way, code using the path manager will correctly load resoruces available in multiple paths and legacy code will continue to work with the leftmost directory in the path variables.
  2. Introduce new variables and have the path manager use those. So we can have a PYMOL_PATH_POLY variable which supports the use of path separators. Then the path manager will load the values in this variable plus the legacy PYMOL_PATH variable. That way, existing code will work as before and code can be gradually refactored to support the path manager.
JarrettSJohnson commented 1 month ago

Thanks for the writeup. As far as I understand, the major difference between the canonical PATH environment variable versus PYMOL_PATH, is that the former suggests that there can be multiple places to search for executables versus PYMOL_PATH which points to where PyMOL exists (which should be a single location). But as you've pointed out, and according to this wiki page though, it seems it also searches for the location of PyMOL plugins (which could be in multiple places). I wonder if users are better served with a separate environment variable (override?) that checks these plugin locations rather than messing with PYMOL_PATH itself.

Other than plugins, are there other instances where multiple paths could be considered but are derived from PYMOL_PATH? From your examples, files like pymol.sty should be in one definite location.

netogallo commented 1 month ago

Thanks for looking at that. I agree it is not a good idea to break a pre-existing assumption about the PYMOL_PATH variable. I personally have no other use cases other than plugins. To my particular case, I simply created a small patch to add an additional path (as you suggested) [1]. I would be satisfied just having one additional env variable.

  1. https://github.com/netogallo/pymol-open-source/commit/8cf28732dea7154db279c1d16704b7dfd65079f1