Closed iwangtoknow closed 6 years ago
I have not tested this specifically as I have not seen any iso-seq data, but it should work at least rudimentary at this point. We can update it so supported more broadly as well. So basically there isn't a reason to "assemble" the iso-seq data (at least I don't think there should be as it is technically 1 read = 1 transcript correct?). So they just need to be mapped to the genome. Best way to do this is to use minimap2
. This isn't built into funannotate yet (but can be if it works well), but you should just need to run something like:
minimap2 -ax splice genome.fa iso-seq_reads.fq | samtools view -bS - | samtools sort -o iso_sort.bam -
This will generate a coord-sorted BAM of the alignments which you could then combine with your small read RNA-seq alignments (i.e. the hisat2 alignments from funannotate train) and pass to the funannotate predict --rna_bam
option. You can then also try to pass the iso-seq data (in FASTA) format to the funannotate predict --transcript_evidence
option -- which will then map it to the genome and use those data for EVM consensus gene model prediction. Let me know if this works and I can add these steps directly to funannotate to make it easier to use these data.
OK, I'll try both. For I have full length non-chimeric fasta file from Pacbio Iso-seq, I would like the second one. I also have reads of insert fastq reads file, I'll also try the former.
Dear Jon,
I tried to pass the full length non-chimeric fasta file from Iso-seq to the funannotate predict --transcript_evidence
option and it works.
I'm trying annotate and an old issue appeared when dealing with antiSMASH local results.
Thanks, that's good to know. Would you be able to construct a test set with the data? Something like 5-6 scaffolds and then a subset of the iso-seq reads that map to those scaffolds? Then I could use that data to do some more tests at explicitly supporting the long read RNA-seq data. Something like mapping the reads to your test data with minimap2 (as above), and then extracting the reads from the BAM file that map would be a way to subset the data.
Dear Jon, We performed Iso-seq, but only got some data (total full length non-chimeric fasta is 20.7M), we are proforming a second run. After I have the second part of data, I can send full data to you. Just tell me what you want. This is a A. nidulans Very thanks.
Okay great, that sounds perfect A. nidulans is one of my favorite fungi...
Dear Jon, I used the data what I have now finished 1st structure & functional annotation, please take a look at the upper fig, Iso-seq raw reads is long enough to overlap a full length transcripts, but contain a lot of gaps (most case 1-2 nucleotide acid ) in the raw reads, (maybe after self-fix that will be fine). GMAP bam And structure annotation results many introns in a gene.
There is not a gene right?
You can also load in some of the preliminary GFF3 annotations to try to see what is happening. It looks like there might be a transcript there but hard to tell for sure if it makes a complete gene model. In the predict_misc folder you can load in the gene_predictions.gff3 file which will contain all of the predictions that went into evidence modeler. Could also look at even.round1.gff3 which is the output of evidence modeled prior to any filtering. Sometimes models get removed because they are repeats/transposon-like or in repeat dense regions. EVM typically doesn’t call genes if there is only 1 type of evidence and no gene model prediction. So maybe we need to incorporate the iso-seq into training Augustus - I was working on this last night actually. Just haven’t tested as I don’t have any reliable test data.
Dear Jon, I'll take your advice fully. I don't have experience to manually edit genes on the basis of reads mapping info. I'm not fully understand. I'll contact you at the first time I prepared to send data to you. I checked the long reads mapping bam in IGV, my Iso-seq reads is rare indeed. sad
So in this example, there aren't any gene predictions in this region thus EVM didn't predict any gene models. Running the data back through PASA (via funannotate update
) might help catch some of these genes. I need to integrate the long reads into the update command yet.
Dear Jon, I have both short reads and long reads transcriptome data, but funannotate don't have a long reads option, I noticed that you have described a method in the paper Extreme sensitivity to ultraviolet light in the fungal pathogen causing white-nose syndrome of bats
So can funannotate support long reads transcriptome seq now? How can I combine Iso-seq data in funannotate pipeline?
Thanks for your help.