kvittingseerup / IsoformSwitchAnalyzeR

An R package to Identify, Annoatate and Visialize Isoform Switches with Functional Consequences (from RNA-seq data)
100 stars 18 forks source link

Question about reference genome and annotations #222

Closed sparthib closed 5 months ago

sparthib commented 9 months ago

Hi there,

I have an human ONT direct cDNA data that I would like to analyze using IsoformSwitchAnalyzeR. I have aligned it using ENSEMBL reference cDNA fasta and then ran alignment mode in salmon using the same fasta file. I used the chr_patch_hapl_scaff.gtf.gz annotation on the resulting counts matrix. I get this warning.

Initially I got the error that the quantification didn't match the annotation, so initially my question was about the annotation, but I reran and and after the 10 steps, I get this warning.

# Warning messages:
#   1: In importRdata(isoformCountMatrix = salmonQuant$counts, isoformRepExpression = salmonQuant$abundance,  :
#                       
#  There were estimated unwanted effects in your dataset but the automatic sva run failed.
#  We highly reccomend you run sva yourself, add the nessesary surrogate variables
# as extra columns in the "designMatrix" and re-run this function
#                     
# 2: In createSwitchAnalyzeRlist(isoformFeatures = isoAnnot, exons = isoformExonStructure,  :
# The gene_ids or isoform_ids were not unique - we identified multiple instances 
#of the same gene_id/isoform_id on different chromosomes.
#To solve this we removed 23 gene_id. 
#Please note there might still be duplicated gene_id located on the same chromosome. 
#Some of these could be due to fusion transcripts which IsoformSwitchAnalyzeR cannot handle.

#The switchAnalyzeRlist now contains all the information imported in separate entries o
#Of the switchAnalyzeRlist object:
  1. Could you explain further why sva would fail and how I could run it manually?
  2. Also, is there a way to see the genes removed that did not have unique gene or isoform IDs or retain them?

Thanks,

Sowmya

chunxubioinfor commented 6 months ago

Hi Sowmya,

  1. I've just checked the source code and in our package, we just pack the sva function from sva package and apply it with default settings. If there's error while running the sva function, our package will report the warning you got. I guess the error is relevant to your data, so I recommend you run it manually. Here is a tutorial on SVA package.
  2. I think you can retrieve this kind of genes just from the gtf file. The gtf file (you could open it and take a glance) consists of various of annotation info including the location on the chromosome.

Hope this can help you!😊