Closed KaparaNewbie closed 1 year ago
Dear @KaparaNewbie,
The reads deposited in SRA directory are FLNCs ( full-length non-chimeric ) according to isoseq
terminology. This means they are ready to be mapped to the genome - you don't need to basecall the sequences yourself.
In this repository, you can also find isoforms reconstructed using TAMA collapse tool
using Octopus sinensis genome as a reference.
We decided to use sinensis and not vulgaris genome as the former is much more complete and genome annotation is already pretty extensive. Those are closely related species of the same species complex only recently being recongnized as separate species.
CAVE: this means that the reconstructed isoforms contain sinensis genomic sequence! While in my opinion this is not a problem at all, we can not guarantee that both species use exactly the same isoforms. There may be individual cases where a gap in sinensis annotation would make it impossible to reconstruct the isoform in the locus. Let me know if this answer was useful and whether I can help you with downstream analyses / interpretation.
Grygoriy
Dear @zolotarovgl, Thank you so much for taking the time to respond in great detail, both here and via email. BTW, sorry for the duplicated messages... I think you answered everything, thanks!
Hey there, Rajewsky lab!
I downloaded samples SRR17321895 to SRR17321901 (7 total) from the SRA. I wanted to preprocess them according to the recommended way in the isoseq3 guide, as instructed in the methods ("Processing of PacBio SMRT data"). However, the files' format is
fastq
rather thanBAM
, and when I try to run:I get the following warning:
Does it mean that the
fastq
files uploaded to the SRA are already processed and ready to be aligned? I started checking this formatting because (strangely?)FastQC
found small amounts ofillumina_small_rna_3'_adapter
in the SRR17321896 sample.Furthermore, I have an additional question, if you may. At first ("Full-length mRNA library preparation and sequencing"), you write,
But later on ("Isoform reconstruction from FLNC reads"), you write:
Could you kindly explain this difference? During most of the paper, you refer to O. vulgaris, but in the methods (and in the
gene_expression
workflow here), you refer to O. sinensis. If this question results from uncareful reading, I apologize in advance.I appreciate any help you can provide.