Closed baagaard-usgs closed 2 years ago
Another option is to refactor one step further:
@jmfee-usgs What do you mean by gmrecords()
? The constructor or a separate function for executing the desired subcommand?
The subcommands take a GMrecordsApp argument and use its args
data member, so it looks like it would take a fair amount of work to disentangle the arguments from GMrecordsApp and the subcommands.
@baagaard-usgs Yes, that would be a fair amount of work and a lot depends on the intended usage (cli only vs api).
Looking at the compute_station_metrics subcommand, it's main method calls self._get_events() (from base) and then uses self.events. In base, it calls base_utils.get_events with kwargs which is closer to what I'd expect as an api user. This requires some effort to trace, and seems tightly coupled.
Not sure if users are writing their own scripts and if there are utils classes or similar for computing station metrics (or other subcommands) without using the cli? Disentangling may be worth the effort.
Thinking along the lines of separate methods/classes that would all be coordinated by main based on args, but could be used independently to:
My main use case is being able to easy run all processing steps with one command. I also like to have everything in a Python script under version control to document all the steps and parameters.
#!/usr/bin/env python3
import os
import pathlib
from gmprocess.apps.gmrecords import GMrecordsApp
NUM_PROCESSES = 0
DATA_DIR = "data-waveforms"
DATA_SETS = ("cesmd", "fdsn")
STEPS = (
"assemble",
"process_waveforms",
"compute_station_metrics",
"compute_waveform_metrics",
"export_metric_tables",
"generate_report",
)
cwd = os.getcwd()
for dset in DATA_SETS:
work_dir = pathlib.Path(DATA_DIR) / dset
os.chdir(work_dir)
app = GMrecordsApp()
app.load_subcommands()
args = {
"debug": False,
"quiet": True,
"eventid": None,
"textfile": None,
"overwrite": False,
"num_processes": NUM_PROCESSES,
}
for step in STEPS:
print(f"Running '{step}'...")
step_args = {
"subcommand": step,
"func": app.classes[step]["class"],
"log": f"{step}.log",
}
args.update(step_args)
app.main(**args)
os.chdir(cwd)
Agreed on running all steps in one command/script, and being able to customize that without leaving python. It's also useful to not require the filesystem for every step. Filesystem is always an option, but separating IO from processing logic makes the code easier to integrate with databases or run in a web service.
Not familiar with all the details, but something along these lines:
import logging
import os
import pathlib
from gmprocess.steps import assemble, compute_station_metrics, ...
DATA_DIR = "data-waveforms"
DATA_SETS = ("cesmd", "fdsn")
def custom_process(work_dir):
try:
for event in load_events(dir=work_dir):
waveforms = assemble(event=event, log="assemble.log")
processed = process_waveforms(waveforms)
metrics = compute_station_metrics(processed)
export_metric_tables(metrics, dir=work_dir)
report = generate_report(processed, metrics)
finally:
os.chdir(cwd)
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO)
data_dir = pathlib.Path(DATA_DIR)
for data_set in DATA_SETS:
work_dir = data_dir / data_set
custom_process(work_dir)
Refactor
GMrecordsApp
so that it is scriptable.Why GMrecordsApp is not scriptable
GMrecordsApp
constructor includes parsing the command line arguments.Proposed solution
Refactor the constructor and
main()
so that keyword arguments can be passed tomain()
so that the arguments provide equivalent behavior to using command line arguments.In most cases this simply requires:
In this case we need to move calling
_parse_command_line()
(and everything after that) from the constructor tomain()
. I propose adding aninitialize()
method to do the stuff that comes after calling_parse_command_line()
in the constructor so as not to cluttermain()
.