Closed bresyd closed 3 years ago
Hello Nice to hear that you find that useful our latest upgrade. Your opinions are very important to let us know future directions for improvement. Regarding your particular questions: 2: Yes, we do it in the way you mention. The script 01.remap.pl maps back the reads to the assembly, and adds the unmapped ones to the contig file. 3: Unmapped reads are then considered as "normal" contigs, but are explicitly excluded from the binning steps 4: Yes. The remap step is independent of the assembly, therefore it will run in the same way when a external assembly is provided. 5: Custom mapping, it should be possible but it is not implemented yet. That is an interesting feature for upcoming versions 6: Yes, in the samples file add a fourth column and put "nobinning" in the samples you don´t want to bin. This is explained in the "samples file" section of the manual.
Hope it helps! Best, Javier
Hi Javier,
thanks for your prompt reply. All of what you said makes sense. Regarding point 5: do you have any suggestion/recommendation on a workaround through which I could already use two different mapping tools for the two different types of data and within one SQM run? Or do you maybe think I should make separate runs with the two data types and then combine them downstream using SQMtools? Any thoughts are very appreciated.
All the best, Benni
I was too quick with my reply. I think I could actually just try and use bwa mem for both data types since it was designed to work for both, short and long sequences (provided the long reads are of high quality, which my reads are).
Thanks again Benni
Ok, let us know how it goes
Best,
sorry to bother you again but there is one more thing I wanted to check: is it possible to tell SQM to only create and include the singletons from a subset of the samples? In my case I would like to include the singletons for the longread metagenome samples but not for the shortread metatranscriptome samples (so basically use the singletons from the same samples that are also used for the assembly but not from the remaining ones).
Thanks again
Hello It will include all sequences. Would it be possible for you to create a simple script removing short sequences from the resulting fasta? If that is possible, run the project adding -step 1. That will make it to stop after the assembly and remap. Then proccess the fasta file, and restart the project.
Hi Javier,
great idea. I will give everything a try and let you know how it goes.
Cheers
You could even use prinseq
for removing these short reads
Hi,
I would like to thank you for your continuous great work with the SQM pipeline, the latest release has some great additional features which are very relevant for my current work. I do have a few questions and would like to get your advice on a couple of things.
I have a new dataset comprising some longread (pacbio) metagenome data plus illumina shortread (single end) metatranscriptome data (dna and rna co-extracted from the same samples). What I would like to do is a coassembly of the longread metagenomes (which I have already done a few trials using canu and flye), then use the coassembly plus the non-assembled reads in your pipeline, and also include the metatranscriptome samples (only for mapping, not binning).
Here are some my my questions/comments:
--singletons
: it is great that you have this option for including unassembled reads. From the longread assemblies that I have done so far I know that many of my genes of interest are present on the read level but do not get assembled into contigs. Hence without the option of including unassembled reads I would have to use the longread analysis script only. I know that canu outputs a file containing the unassemble reads but from what I know flye does not have this option. I am just curious to know how you include the unassembled reads when flye is used as the assembler (are you mapping the reads back and then use the unmapped ones)?--singletons
: are they also used for the binning or are they not considered since they would not have proper differential coverage?--singletons
question: if I do the longread assembly externally, is there a way to provide the assembly plus the unassembled reads to the SQM pipeline?Thanks in advance for your help.
Cheers