zeeev / vcflib

a simple C++ library for parsing and manipulating VCF files, + many command-line utilities
https://github.com/ekg/vcflib#vcflib
MIT License
19 stars 6 forks source link

Made 'stand-beside' to vcflib #7

Open travc opened 9 years ago

travc commented 9 years ago

I've separated your code from vcflib to make a 'stand-beside' version. https://github.com/travc/GPAT

It works with the current ekg/vcflib, the directory to which is just specified at the top of the Makefile. Just clone it in the vcflib directory (ekg's vcflib) and then cd GPAT; make, or clone it where-ever you want and change the VCFLIB_PATH variable in the Makefile to point to a clone of vcflib. The executables get put into vcflib/bin directory, but that could easily be changed in the Makefile.

It is also suitable for making a submodule in vcflib, which is how I have it setup. If you and ekg want to, your tools could be included in the official vcflib. Alternatively you could include vcflib as a submodule to GPAT. The Makefile changes needed for that are easy.

I think this setup will simplify maintenance for you... At very least it is less confusing to have your programs in a repository called something other than vcflib.

And just in case you're curious, here is it added as a submodule to vcflib: https://github.com/travc/vcflib/tree/GPAT This includes vcflib Makefile changes so GPAT is compiled with the rest of vcflib (can be merged with ekg/vcflib if official inclusion is what you and he decide on).

PS: I'm planning on adding some options to some of the tools (pFst for example) so target and background lists can be sample names instead of column numbers. I did this reorganization / minor refactoring because I also needed to make some minor fixes to vcflib as well as your code, and under your existing organization those wouldn't propagate to vcflib (as well as your code not benefiting from updates/fixes in vcflib).

zeeev commented 9 years ago

I really like this idea. I'm going to look through the restructured code when I'm not traveling.

I also like the idea of using the sample names rather than a numerical index.

travc commented 9 years ago

I'm in no rush... I've got it working nicely for myself.

Sample names instead of indexes (as an option) is done for pFst. I also added the ability for --target and --background to be files listing the samples (like vcftools fst command uses). I went ahead and made quite a few minor changes to argument parsing, how it makes the sample lists, and general cleanup. However, I've left the default behaviour the same and it outputs the exact same results in my little tests.

I'll probably go through other tools as I need them (or if there is any demand). Some stuff, like the parsing of sample list arguments, should probably get moved out of pFst.cpp and shared between tools.

Anyways, I'm working in my devel branch if you're interested (when you get the time): https://github.com/travc/GPAT/tree/devel The changes I've made so far (all in pFst.cpp) should be usable in your vcflib fork even if you don't split your code out of vcflib... though if you do make your own GPAT repository, then we can use nice pull-requests.