Significant changes.
- **UNITE 10.0 added is the default DB version until further update.
- The default classifier now is SINTAX. It's much much quicker and provides very comparable results against RDP Classifier. This change is due to the fact that UNITE database has increased massively.
- Just a slight change in the installation instruction, namely from python=3.6 to python=3.8 to avoid "SyntaxError: invalid syntax"
Some significant changes!
- PIPITS now classifies sequences against UNITE 9.0 (205,888 fungi & 326,300 Eukaryotes - see below).
- The database now includes non-fungi (i.e. Eukaryotes) to ensure that the infamous OTUs with a mere "k__Fungi" could be better classified. With the inclusion, you will now see OTUs classified as "k__Fungi", "k__Viridiplantae" or "k__unidentified". Do note that depending on your choice of primers, you may pick up sometimes quite a lot of plant ITS sequences (no primers are perfectly specific for fungi).
- However, because of the significant increase in the size of the database, PIPITS now requires at least 16GB of RAM (preferably more e.g. 32GB). This may not suite those who used to enjoy running PIPITS on their laptop. Sorry... time has moved on!
- Also the increase in the size of the database meant that RDP Classifier can take a very long time to process the data. For this reason, you now have an option to run SINTAX (VSEARCH) to assign sequences. This is remarkably quick!
- If you find that RDP Classifier is taking too long, please use "--taxassignmentmethod sin" to just run SINTAX (VSEARCH). That said, the confidence threshold of 0.85 doesn't equates 0.85 of RDP Classifier though from my experience, the differences are small. Do note that SINTAX is a non-Bayesian taxonomic classifier.
- I will look to incorporate other classifier such as CONSTAX in the future!
- UNITE 8.3 added. PIPITS now classifies sequences against UNITE 8.3 (98,090 sequences)
- WARCUP phylotype table bug fixed. It now produces correcly aggregated table (it used to aggregate at the Family-level, but now it aggregates at the Species-level)
- BIOM to phylotype table bug fixed. After BIOM (one of the dependencies) was upgraded, phylotype table inadvertently got filled with normalised values. This now has been remedied, and it's now back to the previous behaviour. For those who just want to convert OTU tables to phylotype tables without re-running PIPITS again, please update PIPITS, and (within pipits_env) then:
pipits_phylotype_biom -i otu_table.biom -o phylotype_table.txt -l 6
- New UNITE DB (released on 2020-02-04). PIPITS will now download the new UNITE db. Also few minor bugs have now been fixed.
- BIOM files are now in the HDF5 format. OTU tables in BIOM format is now in HDF5 rather than JSON format. OTU tables in HDF5 BIOM are supported by PHYLOSEQ and QIIME2.
- PIPITS_PROCESS automatically downloads UNITE database (the most recent version), so there is no need to meddle with environment variables anymore. Just run commands and it will take care of the database issues. You can still use older database by the way using --unite option (see help by -h).
- PIPITS_FUNITS exploits multiple CPUs. It's an experimental feature, so do use it with care. You can invoke to use multiple CPUs by using the usual
-t NUMBER_OF_CPUS
option.- Update PIPITS with
conda update --channel bioconda --channel conda-forge --channel defaults pipits
then check you have version 2.3 installed by:conda list pipits
is an automated pipeline for analyses of fungal internal transcribed spacer (ITS) sequences from the Illumina sequencing platform.
only works on POSIX systems (this essentiallly means it doesn't work in Windows - sorry...).
will need at least 16 GB of RAM on your machine running 64-bit Linux of mac OS.
Automatically downloads the most recent version of UNITE fungal db (and also comes with an option to run it against WARCUP fungal db).
Just 4 commands, and you are good to go!
It is recommended that you use a conda environment for running PIPITS to ensure that its dependencies are contained in this "sandbox". This meant that you don't mess with your existig system and you don't need to be the admin. Don't worry, it's easy - just type the following command.
EXPLANATION: install PIPITS and dependencies and create a Conda environment (here the environment is named "pipit_env" but you can choose any name you wish). PIPITS is exclusively compatible with Python3, so add "python=3.10" as below:
conda create -n pipits_env --channel bioconda --channel conda-forge --channel defaults python=3.10 pipits
The PIPITS is divided into three consequential parts:
Let's test it with a very small test dataset to ensure everything is set up correcly.
EXPLANATION: Download & extract a test dataset
wget https://sourceforge.net/projects/pipits/files/PIPITS_TESTDATA/pipits_test.tar.gz -O pipits_test.tar.gz
tar xvfz pipits_test.tar.gz
EXPLANATION: Get into the Conda environment you've just created, and run PIPITS.
cd pipits_test
conda activate pipits_env
pispino_createreadpairslist -i rawdata -o readpairslist.txt
pispino_seqprep -i rawdata -o out_seqprep -l readpairslist.txt
pipits_funits -i out_seqprep/prepped.fasta -o out_funits -x ITS2 -v -r
pipits_process -i out_funits/ITS.fasta -o out_process -v -r
Some rare setups (e.g., installation in user-level folders of dated server distributions) cause pipits_process
to fail while converting to biom format. The issue can be solved by updating the fresh installation from within the environment: conda update pipits
.
Illumina reads are generally provided as demultiplexed FASTQ files where the Illumina software (BASESPACE) splits the reads into separate files, one for each barcode.
EXPLANATION: PISPINO (originally part of PIPITS) provides a script called
pispino_createreadpairslist
which generates a tab-delimited text file for all read-pairs from the directory containing your raw sequences
pispino_createreadpairslist -i rawdata -o readpairslist.txt
EXPLANATION: Once we have the list file ("readpairslist.txt"), we can then begin to "prepare" the sequences:
pispino_seqprep -i rawdata -o out_seqprep -l readpairslist.txt
The output from pipits_prep is taken as an input for this step. It is also mandatory to provide the script with which ITS subregion (i.e. ITS1 or ITS2) is to be extracted.
EXPLANATION: the input file (indicated with "-i") is the resulting file from the previous step
pipits_funits -i out_seqprep/prepped.fasta -o out_funits -x ITS2
EXPLANATION: This is the final step involving clustering and assigning of taxonomy.
pipits_process -i out_funits/ITS.fasta -o out_process
conda deactivate
You can tweak parameters and there are several options for each of the above steps. To view them, type "-h" after each command.
pipits_prep -h
Run pipits_funguild.py on the resulting OTU table to have a reformatted version for FUNGuild analysis. See their page for more detail.
pipits_funguild.py -i out_process/otu_table.txt -o out_process/otu_table_funguild.txt
Please cite:
Hyun S. Gweon, Anna Oliver, Joanne Taylor, Tim Booth, Melanie Gibbs, Daniel S. Read, Robert I. Griffiths and Karsten Schonrogge, PIPITS: an automated pipeline for analyses of fungal internal transcribed spacer sequences from the Illumina sequencing platform, Methods in Ecology and Evolution, DOI: 10.1111/2041-210X.12399