rachelss / SISRS

Site Identification from Short Read Sequences.
24 stars 15 forks source link

RAL_SE_Mapping #31

Closed BobLiterman closed 7 years ago

BobLiterman commented 7 years ago

Hello SISRS Team,

I made some adjustments to SISRS and things are running fine on my end. Changes include:

1) Added an option to include an output directory (-z). Raw data is symbolically copied into this directory and SISRS runs as normal, but the raw data folder is unchanged. Default behavior remains, as if -z is not set, output goes to MAINFOLDER.

2) Added an assembler option for premade composite genomes (-a premade). This allows users to manipulate their reads and genomes however they wish, and then run SISRS starting from the read mapping step. Useful for very large datasets. File should be named 'contigs.fa', and placed in the MAINFOLDER in a subfolder called 'premadeoutput'.

3) Composite genomes now get a 'SISRS_' prefix attached to each scaffold, which allows for assembler-independent fetching downstream.

4) Changed all bowtie mapping steps to single-ended mode, and only retained uniquely mapping reads. There are various biological reasons that this is appropriate.

5) Changed README to include these changes, as well as noting which versions of programs (assembler, samtools) have been validated as working.