veg / hivclustering

Infer molecular transmission networks from pairwise distance files (part of HIV-TRACE)
2 stars 5 forks source link

Feature Request/Improvement: Better usability through enabling easier imports and argument passing #42

Open FynnFreyer opened 1 year ago

FynnFreyer commented 1 year ago

While this is probably meant to be used as a set of command line tools first and foremost, it would be nice to easily use this software from within other python projects.

Description

I'm thinking about two main points of improvement here:

  1. The scripts could be included in the package proper and be made available as entry points
  2. The argument parsing should be done in separate functions, that can optionally take caller provided arguments. The functions that do the actual work would ideally take specified parameters, but it would already be much nicer if you could just pass an arguments object to be used.

Imports

Specifically, I'm thinking about this, because I'm trying to run hivnetworkcsv from another Python project. And while the file is already nicely written in a way that avoids side effects on import I can't easily import it, because setuptools just puts scripts in a bin folder on your PATH on install, but that directory is not added to the PYTHONPATH (for good reasons).

I could of course just run it in a subprocess, but then passing arguments and parameters becomes unwieldy quickly, and I lose debuggability, performance, informative stack traces etc. Another option is to jerryrig it using the runpy module, which is what I'm doing right now, but that's pretty hacky.

A nice and clean way this could be accommodated is by using setuptools' entry points. Those can expose functions as command line scripts.

Arguments

After working around the import issue with runpy, I noticed that for example in the build_a_network function in networkbuild.py, the argument parsing is not really separated from the program logic. This is of course hard in this case, because there's additional conditional logic changing the state of the settings afterward, which at a glance seems to not be easily separable from the "actual" work.

Being able to pass the arguments directly to this function would make it a lot easier to work with when importing it, and cleanly separating argument parsing and work improves maintainability in the long run, by reducing mental burden when trying to understand the program.

Resolution

I'd think I'd be capable of solving the first part of this issue without breaking things, and I'd also be willing to work on the second part (at least for networkbuild.py), but because I have no biological background, and no firm mental model of what's happening in the background, that work should definitely be reviewed by someone understanding the subject.