samplchallenges / SAMPL-league

SAMPL container based workflows
1 stars 2 forks source link

Figure out best practice way to take in smiles from standard in #67

Open mikemhenry opened 3 years ago

mikemhenry commented 3 years ago

We have issues when using containers that have a conda env as an entry point:

# works by adding white space to end of SMILES string
docker run -it --rm -v $(pwd):/data adv -r "inputs/4w51-cryo.pdb" -c -32.355 7.263 2.207 -b 14 14 14 -e 50 -s "CC(C)Cc1ccccc1 "
# works but SMILES may use backslashes
docker run -it --rm -v $(pwd):/data adv -r "inputs/4w51-cryo.pdb" -c -32.355 7.263 2.207 -b 14 14 14 -e 50 -s "CC\(C\)Cc1ccccc1"
# doesn't work, causes a parse error for openeye 
docker run -it --rm -v $(pwd):/data adv -r "inputs/4w51-cryo.pdb" -c -32.355 7.263 2.207 -b 14 14 14 -e 50 -s "\"CC\(C\)Cc1ccccc1\""
Warning: Problem parsing SMILES:
Warning: "CC\(C\)Cc1ccccc1"
Warning: ^

We need to figure out a robust way to handle this.

List of test smiles cases:

CSC1=CC=C(C=C1)[N+][O-]
CC1=CC=C(C=C1)N=[N+]=[N-]
CC#Cc1ccccc1
[N-]=[N+]=NC1=CC=CC=C1
c1noc2ccccc12
C1=CC(=C(C(=C1)F)CBr)F
C#Cc1ccccc1
[N-]=[N+]=NC1=CC=CC=C1
C1=CC=C(C(=C1)Cl)Cl
CCBr
CC1=CC(=CC=C1)CN=[N+]=[N-]
CC1=CC(=CC=C1)CN=[N+]=[N-]
SCCCc1ccccc1
CC/C=C\\CCOC=O
megosato commented 3 years ago

Here is the exact error that occurs:

$ docker run -it --rm -v $(pwd):/data adv -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14 -e 50 -s "CC(C)Cc1ccccc1"
ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['dock', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14', '-e', '50', '-s', 'CC(C)Cc1ccccc1']' command failed.  (See above for error)
/opt/conda/envs/ADVenv/.tmp776pf1ux: line 3: syntax error: unexpected "("
robbason commented 3 years ago

We should try calling this from python as opposed to the command line to see how it works.

On Tue, Jun 29, 2021, 2:16 PM megosato @.***> wrote:

Here is the exact error that occurs:

$ docker run -it --rm -v $(pwd):/data adv -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14 -e 50 -s "CC(C)Cc1ccccc1" ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['dock', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14', '-e', '50', '-s', 'CC(C)Cc1ccccc1']' command failed. (See above for error) /opt/conda/envs/ADVenv/.tmp776pf1ux: line 3: syntax error: unexpected "("

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/samplchallenges/SAMPL-league/issues/67#issuecomment-870812265, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHMXSSTX5TEGXVUCEAT3ZLTVIEWTANCNFSM47QS6UYA .

mikemhenry commented 3 years ago

This is how they are doing it (I've linked the line throwing the error): https://github.com/conda/conda/blob/master/conda/cli/main_run.py#L33

megosato commented 3 years ago

We should try calling this from python as opposed to the command line to see how it works.

@robbason it works fine if you run using a call to python from the command line python

# CALLED USING DOCKER RUN
oedock % docker run -it --rm -v $(pwd):/data oedock -r /data/sEH-apo.pdb -s "C1=CC=C(C(=C1)Cl)Cl" -b -5 -5 -5 5 5 5
ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['oedock', '-r', '/data/sEH-apo.pdb', '-s', 'C1=CC=C(C(=C1)Cl)Cl', '-b', '-5', '-5', '-5', '5', '5', '5']' command failed.  (See above for error)
/opt/conda/envs/oepy37/.tmpicnzha6z: line 3: syntax error: unexpected "("

# CALLED FROM COMMAND LINE
oedock % python oedock_process.py -r sEH-apo.pdb -s "C1=CC=C(C(=C1)Cl)Cl" -b -5 -5 -5 5 5 5 
score: 1.8279122114181519   type: <class 'float'>
megosato commented 3 years ago

Just so this information is consolidated here: The issue seems to only be with docker containers that inherit from miniconda. The rdkit logp container I made which inherits from a mcs07/rdkit:latest rather than continuumio/miniconda3:4.9.2-alpine handles the quoted smile strings properly:

predict-rdkitlogp % docker run -it --rm rdlogp "C1=CC=C(C(=C1)Cl)Cl"
2.9934000000000003

docker run command is tokenized below

"CCc1ccc(C)cc1"  - No special treatment (error)
/opt/app # conda run -n ADVenv dock -s "CCc1ccc(C)cc1" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CCc1ccc(C)cc1', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']

"CCc1ccc\(C\)cc1" - escape character for () adds an extra escape character (works and doesn't seem to cause a change in structure)
conda run -n ADVenv dock -s "CCc1ccc\(C\)cc1" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CCc1ccc\\(C\\)cc1', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']

"\"CCc1ccc(C)cc1\"" - escaped quotations not evaluated into string (error)
conda run -n ADVenv dock -s "\"CCc1ccc(C)cc1\"" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', '"CCc1ccc(C)cc1"', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']

"CCc1ccc(C)cc1 " - extra white space character tacked on (works)
conda run -n ADVenv dock -s "CCc1ccc(C)cc1 " -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CCc1ccc(C)cc1 ', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']
"C1=CC=C(C(=C1)Cl)Cl" - No special treatment (error)
/opt/app # conda run -n ADVenv dock -s "C1=CC=C(C(=C1)Cl)Cl" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'C1=CC=C(C(=C1)Cl)Cl', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']
ERROR conda.cli.main_run:execute(36): Subprocess for 'conda run ['dock', '-s', 'C1=CC=C(C(=C1)Cl)Cl', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']' command failed.  (See above for error)
/opt/conda/envs/ADVenv/.tmpxat7qbd8: line 3: syntax error: unexpected "("

"C1=CC=C\(C\(=C1\)Cl\)Cl"  - escape character for () adds an extra escape character (works and doesn't seem to cause a change in structure)
/opt/app # conda run -n ADVenv dock -s "C1=CC=C\(C\(=C1\)Cl\)Cl" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'C1=CC=C\\(C\\(=C1\\)Cl\\)Cl', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']

"C1=CC=C(C(=C1)Cl)Cl " - extra white space character tacked on (works)
conda run -n ADVenv dock -s "C1=CC=C(C(=C1)Cl)Cl " -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'C1=CC=C(C(=C1)Cl)Cl ', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']
"CCC\(=O\)/C\(=C\(/F\)\Cl\)/F" - escape character for () adds an extra escape character (works and doesn't seem to cause a change in structure)
conda run -n ADVenv dock -s "CCC\(=O\)/C\(=C\(/F\)\Cl\)/F" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CCC\\(=O\\)/C\\(=C\\(/F\\)\\Cl\\)/F', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']
"CC/C=C\\CCOC=O" - No special treatment (error maybe because it treats \ as an escape character)
conda run -n ADVenv dock -s "CC/C=C\\CCOC=O" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CC/C=C\\CCOC=O', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']
ERROR conda.cli.main_run:execute(36): Subprocess for 'conda run ['dock', '-s', 'CC/C=C\\CCOC=O', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']' command failed.  (See above for error)
Warning: : Failed due to unspecified stereochemistry

"CC/C=C\\CCOC=O " - white space added to end (works)
conda run -n ADVenv dock -s "CC/C=C\\CCOC=O " -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CC/C=C\\CCOC=O ', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']

"CC/C=C\\\\CCOC=O" - esc characters added (works and seems like structure is correct)
/opt/app # conda run -n ADVenv dock -s "CC/C=C\\\\CCOC=O" -r inputs/4w51-cryo.pdb -c -32.355 7.263 2.207 -b 14 14 14
['dock', '-s', 'CC/C=C\\\\CCOC=O', '-r', 'inputs/4w51-cryo.pdb', '-c', '-32.355', '7.263', '2.207', '-b', '14', '14', '14']
mikemhenry commented 3 years ago

Something else we can try is using the python API for conda https://docs.conda.io/projects/conda/en/latest/api/python_api.html

Then we can try using that to pass a command into a conda env, which should help to get around the way they are using subprocess.

I was also thinking that we could provide a template setup.py and cmd.py that would show an example of importing an external package, and then running a command in that conda env. That could be a way to make a more standard way for people to make their containers. As a bonus, if we do this per-challenge, we could include in the template the command arguments we expect them to handle.