siminegroup / E2EDNA2

GNU General Public License v3.0
17 stars 7 forks source link

Feature Request: Argument parsing #7

Closed schackartk closed 2 years ago

schackartk commented 2 years ago

Hello,

Would you be interested in more fully utilizing command-line argument parsing (e.g. using argparse)? I always feel a bit uncomfortable having to edit source code to use a program. It would be great if you could set the parameters strictly from the CL at runtime, such as workdir, mmb dir, and mmb, instead of editing main.py which is tracked by git.

Additionally, using argparse would give the opportunity to provide a very helpful user interface. For instance, the user could run: python main.py --help to get a help message explaining what their options are.

schackartk commented 2 years ago

It would also be useful to prevent runtime exceptions by performing more argument validation. For instance, if the output directory for the current run already exists, right now an exception is produced:

Start automating tests one by one...
====================================
TESTING MODE #1: '2d structure'
Traceback (most recent call last):
  File "main.py", line 229, in <module>
    opendna = opendna(params)  # instantiate the class
  File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
    self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
  File "/home/ken/personal/E2EDNA2/opendna.py", line 147, in setup
    os.mkdir(self.workDir)
FileExistsError: [Errno 17] File exists: '/home/ken/personal/E2EDNA2/localruns/run1'

END OF TEST #1. Results are saved to folder "run1", where:
        2d structure: in record.txt

An exception could be avoided by validating that the output directory does not exist, and providing a useful message such as "The output directory for this run already exists at './localrun/run1'", and an optional -f/--force flag could be provided to overwrite the output directory.

Similarly, if the outdir doesn't exist, an exception occurs:

TESTING MODE #1: '2d structure'
Traceback (most recent call last):
  File "main.py", line 229, in <module>
    opendna = opendna(params)  # instantiate the class
  File "/home/ken/personal/E2EDNA2/opendna.py", line 53, in __init__
    self.setup()  # if we don't need a workdir & MMB files (eg, give a 3D structure), don't make one.
  File "/home/ken/personal/E2EDNA2/opendna.py", line 147, in setup
    os.mkdir(self.workDir)
FileNotFoundError: [Errno 2] No such file or directory: '/home/ken/personal/E2EDNA2/localruns/run1'

END OF TEST #1. Results are saved to folder "run1", where:
        2d structure: in record.txt

Similarly this could be solved during argument validation with something like

if not os.path.isdir(outdir):
    os.mkdir(outdir)
InfluenceFunctional commented 2 years ago

Hi @schackartk - thanks for the note and particularly the point about argument validation. We had planned to do argparse and yaml integration so one could track experiments with separate config files and/or on the command line, but haven't got around to it yet. I do think it's a good upgrade and we will hopefully get to it soon.

taoliu032 commented 2 years ago

Additionally, using argparse would give the opportunity to provide a very helpful user interface. For instance, the user could run: python main.py --help to get a help message explaining what their options are.

@schackartk - thank you for the suggestions! I did not know this function and have been thinking about how to show users the available options.

We will create a branch to realize these helpful features 😎. Meanwhile, you are more than welcome to fork the repo and work on it (e.g., try out some experiments or ideas). We can keep the discussion going here :)

Update: a branch for this issue has been created. See it at the "Development" on the right @InfluenceFunctional

schackartk commented 2 years ago

Awesome! I will make a fork and implement a few arguments just to get it started and to demonstrate how it can be done. Argparse is really powerful for this and does so much for you with little effort.

I will start with the current CL arguments and also the ones that you are instructed to change in main.py. For now I will leave the others hard-coded, and you can add them as you see fit.

p.s. I am thrilled to see actively engaged maintainers, it is rare to to find in smaller projects.

InfluenceFunctional commented 2 years ago

@schackartk I had basically planned to port our infrastructure more or less directly from https://github.com/influencefunctional/activelearningpipeline - the main workload is just transposing all the args!

schackartk commented 2 years ago

Should --ligandSeq be required if --ligand is not 'other'?

taoliu032 commented 2 years ago

Should --ligandSeq be required if --ligand is not 'other'?

No. The idea is to ask users to provide the ligand sequence, if the ligand has one, such as DNA, RNA, or peptides.

schackartk commented 2 years ago

So in those cases, is it required?

taoliu032 commented 2 years ago

So in those cases, is it required?

Yes but it is not an essential piece of information. Compared to sequence, ligand structure is more important and is required for docking experiment.

taoliu032 commented 2 years ago

Hi @schackartk, I updated our code with more argparse statements and a configuration .yaml file! Check it out here. Thank you for your advice!