Repo reorganization for conda and multiple codebases

gwct commented 2 years ago

This is a major reorganization of the repository to facilitate building from conda and to combine multiple codebases. This is also merges the PhyloAcc-interface code with the main PhyloAcc repo. These changes include:

Combining PhyloAcc and Python interface repos to facilitate codebase merging in the future and to develop the conda package
Moved all data files to Hu-etal-2019 to split into a separate repo later
Updated Makefile to be used with conda build; the original Makefile can still be found in the src/PhyloACC-ST/ folder if we ever need to build from source that way
Added meta.yaml, build.sh, and build.bat for conda building which will be split out later to our fork of bioconda
Changed name of interface from 'phyloacc_interface.py' to 'phyloacc.py' and changed name of PhyloAcc binary to PhyloAcc-ST
Changed default path for the PhyloAcc binary within the interface to point to PhyloAcc-ST to align with the name change above
Changed licsene from MIT to GPL3
Moved V2_GBGC to the src/PhyloAcc-ST/ folder

tsackton commented 2 years ago

This is great, Gregg. I have a few edits to suggest before merging.

I don't think there is a reason the phyloAcc V2_GBGC codebase should be a subdirectory of phyloAcc_GT, instead of its own top-level subdirectory in the src folder. E.g., src would have four subdirs: the interface, the original GT version, the GBGC version, and the ST version.
Related, there is still some simulation results in the V2_GBGC subdirectory that should be moved to Hu-etal-2019 to be split out into a new repo after merging.
Although moving the simulation data to the paper repository makes sense, we should still retain a small test dataset in the main PhyloAcc repository. Probably this can be in a test subdir and could eventually form the nucleus of some kind of actual testing infrastructure for this project.

gwct commented 2 years ago

The latest commit should address these issues.

The V2_GBGC folder is now src/PhyloAcc-ST-GBGC/
I've moved the data from the GBGC source folder into its own subfolder in Hu-etal-2019
I added the original test data that is referred to in the README into the test subfolder in the project root

Regarding point 3, adding some way to directly test the PhyloAcc binary is on the TODO list. I was thinking that adding a simple --version flag would be enough to test the install. Currently, with the conda package, the binary is being tested through the interface with subprocess.run(), which I think is fine but a built in test would be nice.

If we do include a small test dataset, I'm not sure how that would work with the conda install at this point. We would either have to copy that data somewhere that it can be found in the environment, or provide the full path to the data within the anaconda file system (seems complicated). Alternatively, we could just point users to download test data from the Hu-etal-2019 repo or even an independent test-data repo. That way the user knows exactly where the data is and can decide if they want it, while we rely on a --version flag or the interface check to test the conda install.

tsackton commented 2 years ago

Re: point 3, a test-data repo might be better as then we can add more stuff to it without it becoming a bit weird to be updating the Hu-etal-2019 repo. But for now let's merge this commit and then we can work on splitting out the pieces that need to move (Hu-etal-2019 and test-data).

phyloacc / PhyloAcc

Repo reorganization for conda and multiple codebases #29