srjun/panfp - Githubissues

PanFP is a Python pipeline to predict pangenome-based functional profiles for microbial communities.

Requirements

Specific libraries are required by PanFP. We provide a requirements file to install everything at once. To do so, you will need first to have pip installed and then run:

pip3 --version                      # Check if installed
sudo apt-get install python3-pip    # if you need to install pip, you can check installation with the previous command
pip3 install -r requirements.txt

Installation & Help

Download this repository and run:

python3 setup.py install

You may require to call it using sudo. Once installed, panfp`should be available anywhere in your terminal.

In the case you need to install the package in a specific directory of your system, you can call the argument --install-lib followed by a directory path:

python3 setup.py install --install-lib /custom/path/

Example

Requirements to run an experiment are:

-d [database of reference genomes with functional annotation] [here]
-a [directory which contains functional profiles of genomes in database] [here]
-i [otu-sample table]

To see additional arguments:

bin/panfp --help

As example, we included an example script [here] with a full workflow of how panfp works and an example otu-sample table [here].

Note that an input, otu-sample table should be in a tab delimited format as follows:

#OTU ID	S1	S2	...	S10	Lineage
OTU_1	0.0	10.0	...	2.0	kBacteria; pProteobacteria; cBetaproteobacteria; oMND1; f__
OTU_2	4.0	430.0	...	24.0	kBacteria; pProteobacteria; cBetaproteobacteria; o; f; g; s__
...	...	...	...	...	kBacteria;pCyanobacteria;c__Oxyphotobacteria
OTU_99	1.0	5.0	...	0.0	kBacteria;pChloroflexi;c__
OTU_100	0.0	35.0	...	2.0	kBacteria; pProteobacteria; cGammaproteobacteria; oEnterobacteriales; fEnterobacteriaceae; gGluconacetobacter; s__liquefaciens

where the first column represents OTU ids, numbers represent raw frequency of 16S rRNA, and the last column represents lineage of OTUs.

As example, we included an example script [here] with a full workflow of how panfp works and an example otu-sample table [here].

Output Information:

The following files are generated in the following order:

updated_otu_table.txt - otu-sample table with updated taxonomic information according to database lineages [example]
lineage_copynum.txt - copy numbers for lineages in an updated otu-sample table [example]
for example, kBacteria.pIgnavibacteriae.c__Ignavibacteria.KO.txt - functional profiles for lineages [example]
updated_otu_table_norm_by_copynum.txt - otu-sample table normalized by median copy numbers of lineages [example]
updated_otu_table_norm_by_copynum_depth.txt - otu-sample table normalized by sequencing depth [example]
lineage_sample_table.txt - lineage-sample table derived from otu-sample table grouping by lineages [example]
function_sample_table.txt - funciton-sample table by multiplying lineage-sample table and lineage-function table [example]

Contact

This project has been fully developed at the group of Translational Bioinformatics - Jun Lab.

If you experience any problem at any step involving the program, you can use the 'Issues' page of this repository or contact: Se-Ran Jun

License

PanFP is under a common GNU GENERAL PUBLIC LICENSE. Plese, check LICENSE for further information.

srjun / panfp

readme