kvittingseerup / IsoformSwitchAnalyzeR

An R package to Identify, Annoatate and Visialize Isoform Switches with Functional Consequences (from RNA-seq data)
96 stars 18 forks source link

isoformSwitchAnalysisCombined() error? #212

Closed marwa38 closed 10 months ago

marwa38 commented 10 months ago

Many thanks for the great package I am using the same database, version, etc but I am getting the following error, Please let me know why I might be getting the following error? How I could change the pvalue cut-off to make it less stringent?

mySwitchList <- importRdata(
  isoformCountMatrix   = Txi_trans$counts,
  isoformRepExpression = Txi_trans$abundance,
  designMatrix         = targets.mod,
  removeNonConvensionalChr = TRUE,
  addAnnotatedORFs=TRUE,
  ignoreAfterPeriod=TRUE,
  isoformExonAnnoation = "Salmo_salar.Ssal_v3.1.110.gtf.gz",
  isoformNtFasta       = "Salmo_salar.Ssal_v3.1.cdna.all.fa.gz",
  showProgress = TRUE)
> mySwitchList
This switchAnalyzeRlist list contains:
 79631 isoforms from 31244 genes
 1 comparison from 2 conditions (in total 12 samples)

Feature analyzed:
[1] "ORFs, ntSequence"

mySwitchList <- isoformSwitchAnalysisCombined(
  switchAnalyzeRlist   = mySwitchList,
  pathToOutput = 'isoform_output') # directory must already exist
# now look at the directory that you just created above
# in case you missed the summary output from the function above

PART 1: EXTRACTING ISOFORM SWTICH SEQUENCES
Step 1 of 3 : Detecting isoform switches...
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01m 03s
Error in extractSwitchPairs(switchAnalyzeRlist, alpha = alpha, dIFcutoff = dIFcutoff,  : 
                              No genes were considered switching with the used cutoff values
                            In addition: Warning message:
                              Zero sample variances detected, have been offset away from zero 
> str(mySwitchList)
Formal class 'switchAnalyzeRlist' [package "IsoformSwitchAnalyzeR"] with 1 slot
  ..@ .Data:List of 11
  .. ..$ :'data.frame': 79631 obs. of  30 variables:
  .. .. ..$ iso_ref               : chr [1:79631] "isoComp_00000001" "isoComp_00000002" "isoComp_00000003" "isoComp_00000004" ...
  .. .. ..$ gene_ref              : chr [1:79631] "geneComp_00000001" "geneComp_00000002" "geneComp_00000002" "geneComp_00000003" ...
  .. .. ..$ isoform_id            : chr [1:79631] "ENSSSAT00000116955" "ENSSSAT00000069109" "ENSSSAT00000173444" "ENSSSAT00000054457" ...
  .. .. ..$ gene_id               : chr [1:79631] "13b2" "1433b" "1433b" "143g1" ...
  .. .. ..$ condition_1           : chr [1:79631] "M" "M" "M" "M" ...
  .. .. ..$ condition_2           : chr [1:79631] "V" "V" "V" "V" ...
  .. .. ..$ gene_name             : chr [1:79631] "13b2" "1433b" "1433b" "143g1" ...
  .. .. ..$ gene_biotype          : chr [1:79631] "protein_coding" "protein_coding" "protein_coding" "protein_coding" ...
  .. .. ..$ iso_biotype           : chr [1:79631] "protein_coding" "protein_coding" "protein_coding" "protein_coding" ...
  .. .. ..$ gene_overall_mean     : num [1:79631] 0.343 257.622 257.622 13.394 1.113 ...
  .. .. ..$ gene_value_1          : num [1:79631] 0.29 283.2 283.2 14.3 1.1 ...
  .. .. ..$ gene_value_2          : num [1:79631] 0.397 232.039 232.039 12.485 1.13 ...
  .. .. ..$ gene_stderr_1         : num [1:79631] 0.106 36.357 36.357 1.5632 0.0633 ...
  .. .. ..$ gene_stderr_2         : num [1:79631] 0.0896 16.5539 16.5539 1.7328 0.1149 ...
  .. .. ..$ gene_log2_fold_change : num [1:79631] 0.4423 -0.2875 -0.2875 -0.196 0.0444 ...
  .. .. ..$ gene_q_value          : logi [1:79631] NA NA NA NA NA NA ...
  .. .. ..$ iso_overall_mean      : num [1:79631] 0.343 132.672 124.95 13.394 1.113 ...
  .. .. ..$ iso_value_1           : num [1:79631] 0.29 145.1 138.1 14.3 1.1 ...
  .. .. ..$ iso_value_2           : num [1:79631] 0.397 120.242 111.797 12.485 1.13 ...
  .. .. ..$ iso_stderr_1          : num [1:79631] 0.106 21.8838 15.3557 1.5632 0.0633 ...
  .. .. ..$ iso_stderr_2          : num [1:79631] 0.0896 7.6358 9.8766 1.7328 0.1149 ...
  .. .. ..$ iso_log2_fold_change  : num [1:79631] 0.4423 -0.2711 -0.3048 -0.196 0.0444 ...
  .. .. ..$ iso_q_value           : logi [1:79631] NA NA NA NA NA NA ...
  .. .. ..$ IF_overall            : num [1:79631] 1 0.513 0.487 1 1 ...
  .. .. ..$ IF1                   : num [1:79631] 1 0.505 0.495 1 1 ...
  .. .. ..$ IF2                   : num [1:79631] 1 0.521 0.479 1 1 ...
  .. .. ..$ dIF                   : num [1:79631] 0 0.0157 -0.0157 0 0 ...
  .. .. ..$ isoform_switch_q_value: logi [1:79631] NA NA NA NA NA NA ...
  .. .. ..$ gene_switch_q_value   : logi [1:79631] NA NA NA NA NA NA ...
  .. .. ..$ PTC                   : logi [1:79631] FALSE FALSE FALSE FALSE FALSE FALSE ...
  .. ..$ :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
  .. .. .. ..@ seqnames       :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. .. ..@ values         : Factor w/ 1112 levels "1","9","10","13",..: 1 2 3 4 5 6 7 8 9 10 ...
  .. .. .. .. .. ..@ lengths        : int [1:29] 71895 59643 58373 53061 39713 41726 46723 42870 48854 41992 ...
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ ranges         :Formal class 'IRanges' [package "IRanges"] with 6 slots
  .. .. .. .. .. ..@ start          : int [1:1036504] 146258 149484 160680 165300 166561 169661 169661 170536 170536 170750 ...
  .. .. .. .. .. ..@ width          : int [1:1036504] 247 174 982 99 358 117 117 85 442 228 ...
  .. .. .. .. .. ..@ NAMES          : NULL
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ strand         :Formal class 'Rle' [package "S4Vectors"] with 4 slots
  .. .. .. .. .. ..@ values         : Factor w/ 3 levels "+","-","*": 1 2 1 2 1 2 1 2 1 2 ...
  .. .. .. .. .. ..@ lengths        : int [1:58] 34586 37309 29699 29944 30994 27379 26764 26297 20391 19322 ...
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ seqinfo        :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots
  .. .. .. .. .. ..@ seqnames   : chr [1:1112] "1" "9" "10" "13" ...
  .. .. .. .. .. ..@ seqlengths : int [1:1112] NA NA NA NA NA NA NA NA NA NA ...
  .. .. .. .. .. ..@ is_circular: logi [1:1112] NA NA NA NA NA NA ...
  .. .. .. .. .. ..@ genome     : chr [1:1112] NA NA NA NA ...
  .. .. .. ..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. .. ..@ nrows          : int 1036504
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. .. .. ..@ listData       :List of 3
  .. .. .. .. .. .. ..$ isoform_id: chr [1:1036504] "ENSSSAT00000231599" "ENSSSAT00000231599" "ENSSSAT00000231599" "ENSSSAT00000231599" ...
  .. .. .. .. .. .. ..$ gene_id   : chr [1:1036504] "ENSSSAG00000088399.1" "ENSSSAG00000088399.1" "ENSSSAG00000088399.1" "ENSSSAG00000088399.1" ...
  .. .. .. .. .. .. ..$ gene_name : chr [1:1036504] NA NA NA NA ...
  .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. ..@ metadata       : list()
  .. ..$ :'data.frame': 2 obs. of  2 variables:
  .. .. ..$ condition   : chr [1:2] "M" "V"
  .. .. ..$ nrReplicates: int [1:2] 6 6
  .. ..$ :'data.frame': 12 obs. of  2 variables:
  .. .. ..$ sampleID : chr [1:12] "Md1" "Md2" "Md3" "Md4" ...
  .. .. ..$ condition: chr [1:12] "M" "M" "M" "M" ...
  .. ..$ : chr "data.frames"
  .. ..$ :'data.frame': 79631 obs. of  13 variables:
  .. .. ..$ isoform_id: chr [1:79631] "ENSSSAT00000076048" "ENSSSAT00000174258" "ENSSSAT00000195598" "ENSSSAT00000238455" ...
  .. .. ..$ Md1       : num [1:79631] 46.9 0 0 18.8 0 ...
  .. .. ..$ Md2       : num [1:79631] 113.01 0 0 9.72 0 ...
  .. .. ..$ Md3       : num [1:79631] 114.67 6.86 4.92 6.57 6.86 ...
  .. .. ..$ Md4       : num [1:79631] 33.52 3.51 4.06 0 3.51 ...
  .. .. ..$ Md5       : num [1:79631] 61.15 0 4.62 13.86 0 ...
  .. .. ..$ Md6       : num [1:79631] 311.11 6.88 24.66 0 6.88 ...
  .. .. ..$ Vd1       : num [1:79631] 198.6 3.21 6.93 14.02 3.21 ...
  .. .. ..$ Vd2       : num [1:79631] 216.2 0 28 27.5 0 ...
  .. .. ..$ Vd3       : num [1:79631] 93.28 0 14.04 4.68 0 ...
  .. .. ..$ Vd4       : num [1:79631] 210.63 6.35 4.52 4.97 6.35 ...
  .. .. ..$ Vd5       : num [1:79631] 111.5 0 24.1 16.5 0 ...
  .. .. ..$ Vd6       : num [1:79631] 183.03 9.64 0 0 9.64 ...
  .. ..$ :'data.frame': 79631 obs. of  13 variables:
  .. .. ..$ isoform_id: chr [1:79631] "ENSSSAT00000076048" "ENSSSAT00000174258" "ENSSSAT00000195598" "ENSSSAT00000238455" ...
  .. .. ..$ Md1       : num [1:79631] 3.49 0 0 1.4 0 ...
  .. .. ..$ Md2       : num [1:79631] 6.078 0 0 0.523 0 ...
  .. .. ..$ Md3       : num [1:79631] 6.082 0.364 0.261 0.348 0.364 ...
  .. .. ..$ Md4       : num [1:79631] 1.866 0.196 0.226 0 0.196 ...
  .. .. ..$ Md5       : num [1:79631] 4.098 0 0.31 0.929 0 ...
  .. .. ..$ Md6       : num [1:79631] 15.217 0.336 1.206 0 0.336 ...
  .. .. ..$ Vd1       : num [1:79631] 12.712 0.205 0.443 0.897 0.205 ...
  .. .. ..$ Vd2       : num [1:79631] 10.78 0 1.4 1.37 0 ...
  .. .. ..$ Vd3       : num [1:79631] 4.494 0 0.676 0.225 0 ...
  .. .. ..$ Vd4       : num [1:79631] 12.515 0.377 0.268 0.295 0.377 ...
  .. .. ..$ Vd5       : num [1:79631] 4.854 0 1.049 0.719 0 ...
  .. .. ..$ Vd6       : num [1:79631] 8.884 0.468 0 0 0.468 ...
  .. ..$ :List of 1
  .. .. ..$ IsoformSwitchAnalyzeR:List of 1
  .. .. .. ..$ version: chr "2.0.1"
  .. ..$ :'data.frame': 116123 obs. of  12 variables:
  .. .. ..$ isoform_id                : chr [1:116123] "ENSSSAT00000000040" "ENSSSAT00000000051" "ENSSSAT00000000053" "ENSSSAT00000000054" ...
  .. .. ..$ orfTransciptStart         : num [1:116123] 1 1 53 1 1 393 1 1 307 72 ...
  .. .. ..$ orfTransciptEnd           : num [1:116123] 3756 2319 757 786 1584 ...
  .. .. ..$ orfTransciptLength        : num [1:116123] 3756 2319 705 786 1584 ...
  .. .. ..$ orfStarExon               : int [1:116123] 18 1 9 1 1 7 1 1 1 1 ...
  .. .. ..$ orfEndExon                : int [1:116123] 1 18 1 10 7 1 9 9 17 2 ...
  .. .. ..$ orfStartGenomic           : int [1:116123] 2659980 2039949 1986440 2167336 2568804 54562109 2569038 2568911 541756 609233 ...
  .. .. ..$ orfEndGenomic             : int [1:116123] 2618503 2153165 1974054 2180220 2592366 54537971 2594199 2594224 587851 615251 ...
  .. .. ..$ stopDistanceToLastJunction: num [1:116123] -76 288 -92 -92 -645 ...
  .. .. ..$ stopIndex                 : num [1:116123] 1 19 1 10 7 1 9 9 19 2 ...
  .. .. ..$ PTC                       : logi [1:116123] FALSE TRUE FALSE FALSE FALSE FALSE ...
  .. .. ..$ orf_origin                : chr [1:116123] "Annotation" "Annotation" "Annotation" "Annotation" ...
  .. ..$ :'data.frame': 116123 obs. of  13 variables:
  .. .. ..$ isoform_id: chr [1:116123] "ENSSSAT00000076048" "ENSSSAT00000174258" "ENSSSAT00000195598" "ENSSSAT00000238455" ...
  .. .. ..$ Md1       : num [1:116123] 1 NaN NaN 1 NaN NaN 1 NaN NaN 1 ...
  .. .. ..$ Md2       : num [1:116123] 1 NaN NaN 1 NaN NaN 1 NaN NaN 1 ...
  .. .. ..$ Md3       : num [1:116123] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ Md4       : num [1:116123] 1 1 1 NaN 1 1 NaN 1 1 NaN ...
  .. .. ..$ Md5       : num [1:116123] 1 NaN 1 1 NaN 1 1 NaN 1 1 ...
  .. .. ..$ Md6       : num [1:116123] 1 1 1 NaN 1 1 NaN 1 1 NaN ...
  .. .. ..$ Vd1       : num [1:116123] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ Vd2       : num [1:116123] 1 NaN 1 1 NaN 1 1 NaN 1 1 ...
  .. .. ..$ Vd3       : num [1:116123] 1 NaN 1 1 NaN 1 1 NaN 1 1 ...
  .. .. ..$ Vd4       : num [1:116123] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. ..$ Vd5       : num [1:116123] 1 NaN 1 1 NaN 1 1 NaN 1 1 ...
  .. .. ..$ Vd6       : num [1:116123] 1 1 NaN NaN 1 NaN NaN 1 NaN NaN ...
  .. ..$ :Formal class 'DNAStringSet' [package "Biostrings"] with 5 slots
  .. .. .. ..@ pool           :Formal class 'SharedRaw_Pool' [package "XVector"] with 2 slots
  .. .. .. .. .. ..@ xp_list                    :List of 2
  .. .. .. .. .. .. ..$ :<externalptr> 
  .. .. .. .. .. .. ..$ :<externalptr> 
  .. .. .. .. .. ..@ .link_to_cached_object_list:List of 2
  .. .. .. .. .. .. ..$ :<environment: 0x0000022a4d83c6a0> 
  .. .. .. .. .. .. ..$ :<environment: 0x0000022a4d83c6a0> 
  .. .. .. ..@ ranges         :Formal class 'GroupedIRanges' [package "XVector"] with 7 slots
  .. .. .. .. .. ..@ group          : int [1:79631] 1 1 1 1 1 1 1 1 1 1 ...
  .. .. .. .. .. ..@ start          : int [1:79631] 1 328 676 982 1288 1636 1942 2248 2596 2902 ...
  .. .. .. .. .. ..@ width          : int [1:79631] 327 348 306 306 348 306 306 348 306 306 ...
  .. .. .. .. .. ..@ NAMES          : chr [1:79631] "ENSSSAT00000076048" "ENSSSAT00000174258" "ENSSSAT00000195598" "ENSSSAT00000238455" ...
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..@ elementType    : chr "DNAString"
  .. .. .. ..@ elementMetadata: NULL
  .. .. .. ..@ metadata       : list()
  ..$ names: chr [1:11] "isoformFeatures" "exons" "conditions" "designMatrix" ...
kvittingseerup commented 10 months ago

You can always try running the individual workflow functions instead of the high-level one (see the vignette for how to). You could also try the DEXSeq-based test - but in general, I would not recommend changing the FDR cutoff.

And just a reminder: It could also be true that your data has no switches.

marwa38 commented 9 months ago

@kvittingseerup

The depth of my (PE150; paired-end each have 150 reads) sequencing is 20M sequences per read (40M per sample), do you consider that is good enough for alternative splicing and/or soform switches analysis? I ran the analyses on 4 different comparisons (same above codes) and couldn't find any isoform switches, although Atlantic salmon (my model species) is quite known for gene duplication.

Having post-translational pathways enriched doesn't that imply that isoform switches is possibly present?

I used kallisto in the upstream analysis and transcriptome (cDNA) for annotation. In contrast to human and mouse, not all species have haplotype information available. The message notes that there is no alternative assembly, including haplotypes, for Atlantic Salmon (salmo_salar) in the Ensembl database.

Last but not least, could you please let me know what are the default FDR (alpha), and dIFcutoff? As I am writing currently and need to report that I ran the analysis (with details) and results showed no isoform switches. Anyway is it possible to run analysis only on Ss4R duplicates only in Atlantic salmon? as in this study https://onlinelibrary.wiley.com/doi/10.1111/mec.14533 Many thanks for your comment in advance.