knights-lab / SHOGUN

SHallow shOtGUN profiler
GNU Affero General Public License v3.0
54 stars 19 forks source link

combined seqs #22

Open waleadebayo opened 5 years ago

waleadebayo commented 5 years ago

Hi,

just a simple question to clarify something while reading through your tool,

--input data in the pipeline or even the align subsection says "combined seqs". This will essentially mean just concatenating paired end reads, for instance ? i.e. it will not mean actual merging paired end reads, as is known generally in paired-end sequencing

Many thanks

GabeAl commented 5 years ago

Hi,

This refers to the fasta format used by qiime(1) that multiplexes reads from multiple samples into a single file, with the header differentiating between samples like so:

Sample_0 [extra fasta header stuff]

Where "Sample" is the name of a sample, then there is an underscore followed by a number indicating the index of the read in the file, then optionally a space followed by arbitrary additional data.

Using the shi7 tool on your raw fastq paired end reads will automatically produce this format after it performs QC.

Cheerio, Gabe

On Wed, Apr 10, 2019, 3:22 PM waleadebayo notifications@github.com wrote:

Hi,

just a simple question to clarify something while reading through your tool,

--input data in the pipeline or even the align subsection says "combined seqs". This will essentially mean just concatenating paired end reads, for instance ? i.e. it will not mean actual merging paired end reads, as is known generally in paired-end sequencing

Many thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/knights-lab/SHOGUN/issues/22, or mute the thread https://github.com/notifications/unsubscribe-auth/AHrXBtp5ZTDl5fYGpGFJv0-9jGORAVSCks5vfjoJgaJpZM4coE2V .

waleadebayo commented 5 years ago

Thanks

LouiseBThingholm commented 5 years ago

Hi, If I start the align command, does it then decontaminate for host contaminates? I don't see a command for what is step 'a - filter' in the overview figure (decontamination). I used shi7 to do the QC in order to get the right format (single fastq file), but this does not decontaminate for host reads as I read it. And if I want to use my own QC pipeline is there a tool to join single-sample fastq files into the 'combined seqs' format? Thanks!

nduan1 commented 2 years ago

Hi,

Can I just cat all the fasta file to one file and treat them as combined seqs with the correct format of header mentioned above? Will shogun pipeline work for unmapped reads which containing a lot of singletons and some PE reads?

Thanks, Ning