phyloacc / PhyloAcc

PhyloAcc a software to detect the changes of conservation of a genomic region
GNU General Public License v3.0
26 stars 12 forks source link

Repo reorganization for conda and multiple codebases #29

Closed gwct closed 2 years ago

gwct commented 2 years ago

This is a major reorganization of the repository to facilitate building from conda and to combine multiple codebases. This is also merges the PhyloAcc-interface code with the main PhyloAcc repo. These changes include:

tsackton commented 2 years ago

This is great, Gregg. I have a few edits to suggest before merging.

  1. I don't think there is a reason the phyloAcc V2_GBGC codebase should be a subdirectory of phyloAcc_GT, instead of its own top-level subdirectory in the src folder. E.g., src would have four subdirs: the interface, the original GT version, the GBGC version, and the ST version.
  2. Related, there is still some simulation results in the V2_GBGC subdirectory that should be moved to Hu-etal-2019 to be split out into a new repo after merging.
  3. Although moving the simulation data to the paper repository makes sense, we should still retain a small test dataset in the main PhyloAcc repository. Probably this can be in a test subdir and could eventually form the nucleus of some kind of actual testing infrastructure for this project.
gwct commented 2 years ago

The latest commit should address these issues.

  1. The V2_GBGC folder is now src/PhyloAcc-ST-GBGC/
  2. I've moved the data from the GBGC source folder into its own subfolder in Hu-etal-2019
  3. I added the original test data that is referred to in the README into the test subfolder in the project root

Regarding point 3, adding some way to directly test the PhyloAcc binary is on the TODO list. I was thinking that adding a simple --version flag would be enough to test the install. Currently, with the conda package, the binary is being tested through the interface with subprocess.run(), which I think is fine but a built in test would be nice.

If we do include a small test dataset, I'm not sure how that would work with the conda install at this point. We would either have to copy that data somewhere that it can be found in the environment, or provide the full path to the data within the anaconda file system (seems complicated). Alternatively, we could just point users to download test data from the Hu-etal-2019 repo or even an independent test-data repo. That way the user knows exactly where the data is and can decide if they want it, while we rely on a --version flag or the interface check to test the conda install.

tsackton commented 2 years ago

Re: point 3, a test-data repo might be better as then we can add more stuff to it without it becoming a bit weird to be updating the Hu-etal-2019 repo. But for now let's merge this commit and then we can work on splitting out the pieces that need to move (Hu-etal-2019 and test-data).