ubccr / software-layer

CCR Software Layer
GNU General Public License v2.0
3 stars 6 forks source link

fastStructure #332

Closed baizm closed 4 months ago

baizm commented 5 months ago

Software name: fastStructure Preferred version: latest Website: https://rajanil.github.io/fastStructure/

tonykew commented 4 months ago

Note that fastStructure was written for Python 2 which has been end of support life for several years.

From: https://en.wikipedia.org/wiki/History_of_Python#Version_2

Python 2.7 support ended on January 1, 2020, along with code freeze of 2.7 development branch. A final release, 2.7.18, occurred on April 20, 2020, and included fixes for critical bugs and release blockers.[31] This marked the end-of-life of Python 2.[32]

There is a git pull request with fixes to work under Python3, which is nine years old at this point, and has still not been committed to the main git branch https://github.com/rajanil/fastStructure/pull/26

I will try to build with the patches in this git pull request, but I don't know if it will work, or if the resulting module will be functional.

tonykew commented 4 months ago

fastStructure has been built, and is going through internal testing. I will update this request once fastStructure has been published.

tonykew commented 4 months ago

fastStructure version 1.0 (with Python3 fixes) has been published to the ccrsoft/2023.01 software release

login1$ module spider fastStructure
----------------------------------------------------------------------------
  faststructure: faststructure/1.0-Python-3.9.6
----------------------------------------------------------------------------
    Description:
      fastStructure is a fast algorithm for inferring population structure
      from large SNP genotype data. It is based on a variational Bayesian
      framework for posterior inference and is written in Python2.x. 

    You will need to load all module(s) on any one of the lines below before the "faststructure/1.0-Python-3.9.6" module is available to load.

      gcc/11.2.0  openmpi/4.1.1
[...]
login1$ module load gcc/11.2.0 openmpi/4.1.1 faststructure/1.0-Python-3.9.6
login1$ 

There are three programs: structure.py, chooseK.py and distruct.py

login1$ structure.py

Here is how you can use this script

Usage: python /cvmfs/soft.ccr.buffalo.edu/versions/2023.01/easybuild/software/avx512/MPI/gcc/11.2.0/openmpi/4.1.1/faststructure/1.0-Python-3.9.6/structure.py
     -K <int> (number of populations)
     --input=<file> (/path/to/input/file)
     --output=<file> (/path/to/output/file)
     --tol=<float> (convergence criterion; default: 10e-6)
     --prior={simple,logistic} (choice of prior; default: simple)
     --cv=<int> (number of test sets for cross-validation, 0 implies no CV step; default: 0)
     --format={bed,str} (format of input file; default: bed)
     --full (to output all variational parameters; optional)
     --seed=<int> (manually specify seed for random number generator; optional)
login1$ 
login1$  chooseK.py

Here is how you can use this script

Usage: python /cvmfs/soft.ccr.buffalo.edu/versions/2023.01/easybuild/software/avx512/MPI/gcc/11.2.0/openmpi/4.1.1/faststructure/1.0-Python-3.9.6/chooseK.py
     --input=<file>
login1$ 
login1$ distruct.py

Here is how you can use this script

Usage: python /cvmfs/soft.ccr.buffalo.edu/versions/2023.01/easybuild/software/avx512/MPI/gcc/11.2.0/openmpi/4.1.1/faststructure/1.0-Python-3.9.6/distruct.py
     -K <int>  (number of populations)
     --input=<file>  (/path/to/input/file; same as output flag passed to structure.py)
     --output=<file> (/path/to/output/file)
     --popfile=<file> (file with known categorical labels; optional)
     --title=<figure title> (a title for the figure; optional)
login1$