Closed lexotero closed 1 year ago
@lexotero,
Thank you so much for this feature request and also for providing sample code! I really appreciate your contribution. I have been meaning to add an option to manually specify the file path to pypgx-bundle
, and your solution using an environment variable is a clever one! I was originally thinking in the line of adding a command line option because I think this makes it very explicit as for which pypgx-bundle
was used for a given run. I'd like to take some time to think about the pros and cons for each approach. But please do feel free to give your take on these two approaches.
BTW, I'm curious to hear more about your motivation for changing the path for pypgx-bundle
. I understand that you'd like to organize your filesystems, but were there any specific reasons you could not or did not want to put pypgx-bundle
to your home directory?
I agree an optional CLI argument would be quite explicit. It could be a top-level CLI option that applies to all subcommands.
The main advantage I see of using the environment variable solution, is that you don't need to pass the value of the CLI argument around to all functions that need the location of pypgx-bundle
.
You could, perhaps, combine both solutions and set the environment variable with the value of the CLI argument, this way a user can either define the environment variable or be explicit and pass the path to pypgx-bundle
as a CLI argument. The code then internally uses the CLI argument once to set the environment variable, and when getting paths relative to the pypgx-bundle
installation, you simply inspect the env variable (like in my example implementation). I don't know, this might turn out to be ugly and confusing...
Regarding your question about the motivation. Mainly, I'm thinking about flexibility in different environments. Some situations that come to mind:
pypgx
with a service user without home directory.pypgx
in the same system simultaneously.pypgx-bundle
).I'd say there are workarounds, like setting the HOME
environment variable to point to a different place in the filesystem, but adding the flexibility to pypgx
sounds like a better solution long term.
@lexotero,
Thank you so much for providing such a detailed explanation! It's exactly what I was hoping to hear. I hadn't really considered situations where a user doesn't have access to the home directory, but it's an important observation for sure.
As for CLI option vs. environment variable, I am still leaning towards the former because 1) many PyPGx users are not necessarily proficient in programming and I think CLI option is more explicit and user-friendly, 2) I counted and there are only four PyPGx actions that will be affected if I introduced the CLI option (run-ngs-pipeline
, run-chip-pipeline
, predict-cnv
, and estimate-phase-beagle
), and 3) I think it's more consistent with the overall CLI design of the tool.
I will publish the official release of 0.21.0
this weekend and then will start working on adding the said CLI option in the 0.22.0-dev
branch.
Thank you for considering this feature!
Let me know if you want any help reviewing/testing.
@lexotero,
After careful consideration, I came to the conclusion that your approach is better because it's way simpler to implement and also because if a user is looking to customize the path to pypgx-bundle
it probably means he or she is already quite famialir with the concept of environment variables in the first place (i.e. they are sort of 'advanced' users)!
Therefore, as promised, I had released the official version of 0.21.0
yeseterday and today I added the feature you requested to the 0.22.0-dev
branch using the environment variable PYPGX_BUNDLE
. I'd greatly appreciate it if you could test this new feature and let me know if everything works as expected. You can access it by:
$ git clone https://github.com/sbslee/pypgx
$ cd pypgx
$ git checkout 0.22.0-dev
$ pip install .
You can also see the updated Read the Docs as well. Thank you so much!
Hello @sbslee ,
I'll check and get back to you today.
Thanks for your help.
Hello @sbslee,
A friendly coworker (@erichards52) with actual field knowledge helped me out with the testing:
pypgx-bundle
is not used.PYPGX_BUNDLE
directory.PYPGX_BUNDLE
is not a valid path, you get a friendly error pypgx.sdk.utils.BundleNotFoundError: /this/should/not/work
.PYPGX_BUNDLE
is not specified, it defaults to ~/pypgx-bundle
.I think that covers the expected behaviour.
Thank you so much, @lexotero and @erichards52! I really appreciate it. Please don't hesitate to let me know if you have any other feature ideas or questions!
Currently,
pypgx
defaults in multiple places to files present inpypgx-bundle
. To locate said filespypgx
assumes thatpypgx-bundle
has been cloned in the user's home directory. This is done with code similar to:from https://github.com/sbslee/pypgx/blob/decb65a9ff7b906ad7cf2a031d646a87e701fcd3/pypgx/api/utils.py#L822
It would be useful to support different locations for the
pypgx-bundle
clone so that users can organise their filesystems in any way they like.Although, in the future, the use of files from the
pypgx-bundle
could be handle differently, for now, an easy way of adding some flexibility would be to extract the fetching of files from the bundle to a utility function. This utility function then could have some logic to determine the path to thepypgx-bundle
clone using user-defined environment variables, and defaulting to the user's home directory. Something like:The
resolve_pypgx_bundle_path()
function could also add some validation to ensure, for example, that the returned path exists.Do you think this would be a useful addition to the library?