Open wheatwill opened 5 years ago
It depends primarily on the characteristics of your data, but also on your goals. In general, the lower the cutoff, the more sensitive the algorithm will be (that is, detect more low expressed sequences but also predict more erroneous sequences).
Thank you very much for your quick reply!
Actually, I am running a set of nontargeted Iso-Seq data. The gene family I am interested in is expected to contain 10-20 members(tandem repeat genes, but only 3 of them have been assembled successfully at the reference genome. So I try to get other transcripts from a full-length transcriptome generated by Pacbio RSII. I used the blastn method to get 1500 sequences from all the flnc reads. Then I run the isoline pipeline directly:IsoCon pipeline -fl_reads blast.out.flnc.fasta -outfolder test.IsoCon.out --ccs polished.total.flnc.bam --nr_cores 24 --min_candidate_support 10
.
--min_candidate_support 10 get 4 final candidates
--min_candidate_support 5 get 15 final candidates
Should I trim these blast out flnc reads at the same start and end position?
Trimming the start and ends at the same locations will greatly help IsoCon at finding the variants and work as it was designed for. This is the very much preferred option! Let's see if you get the same variability after this.
You can do some post analysis of IsoCon's results by looking at the read support of each final candidate (could be done as sanity check for results both with or without trimming ends). The support can be observed by counting the number of reads that were assigned to each consensus in the cluster_info.tsv
file. (Alternatively, the accessions of the candidates in the final_candidates.fa
contains related information of how many reads that supports them, but counting rows in the tsv is more exact).
Hi,when I run IsoCon,I found the results vary greatly with different --min_candidate_support set. So I wonder how to set this parameter is ok?