sestaton / transposome-scripts

Utility scripts using for working with Transposome
MIT License
2 stars 0 forks source link

transposome-scripts

Utility scripts using for working with Transposome

DESCRIPTION

This repository contains user-supplied scripts to help with many aspects of genome analysis using Transposome. A lisitng of the included scripts is provided below.

ToolDescription
sample_reads.plRandomly sample a FASTA or FASTQ file for a desired number of sequences.
get_seqs_from_list.plPull sequences from a file given a file of sequence IDs.
sample_and_interleave_pairs.plSample reads from both pairs of a paired-end read and interleave them for input to Transposome.
permute_graph_edges.plTest a range of parameters for filtering pairwise matches to graph edges.
permute_cluster_joining.plVary the threshold for joining clusters.
format_database.plFormat a FASTA file of repeat sequences for use with Transposome.
select_database_species.plSelect all sequences from a given species from a repeat database.
full_analysis_with_varying_coverage.plCluster and annotate sequences with varying genome coverage.

USAGE

All scripts will generate a usage statement if executed with no arguments, so to find out the proper usage of script simply type perl followed by the name of the script. For example,

$ perl sample_reads.pl 
USAGE: sample_reads.pl -i seqs.fasta -n 100000 

Required:
 -i|infile            :       The sequence file to sample.
 -n|sample_size       :       The number of sequences to sample (integer).
                              (NB: the value may be as '1000000' or with underscores
                               as in '1_000_000').
Options:
 -f|format            :       The input sequence format (Default: FASTA).
 -h|help              :       Print a usage statement.

CONTRIBUTING

If you write a script that uses Transposome please fork this repo, add your script and a description of it to the table above, and then send a pull request. I'll be happy to incorporate any script that looks useful. Please try to follow the coding style of the provided scripts as much as possible. At minimum, make sure the scripts handle options correctly and produce some kind of usage statement.

CITATION

If you use any of these tools in your work, please use the following citation:

Staton SE, and Burke JM. 2015. Transposome: A toolkit for annotation of transposable element families from unassembled sequence reads
    Bioinformatics, doi: 10.1093/bioinformatics/btv059

LICENSE AND COPYRIGHT

Copyright (C) 2014-2017 S. Evan Staton

This program is distributed under the MIT (X11) License, which should be distributed with the package. If not, it can be found here: http://www.opensource.org/licenses/mit-license.php