jpromeror / EventPointer

R package for the identification and statistical analysis of alternative splicing events using junction arrays or RNASeq data
4 stars 0 forks source link

EvenPointer_IGV error #11

Closed biointf closed 4 years ago

biointf commented 4 years ago

Dear Juan Pablo,

I got this error while running the function:

 EventPointer_IGV(Events = Events [ grep ("NRG3", Events$`Gene name`, fixed=TRUE),,drop=FALSE],
+                  input = "AffyGTF",
+                  inputFile = ClariomD.GTF,
+                  PSR = PSRProbes,
+                  Junc = JunctionProbes,
+                  PathGTF = getwd(),
+                  EventsFile = EventsFound,
+                  microarray = array)
Creating SG Information...Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK

Reading Information On Probes...Done
Indexing Genes and Probes...Done
  |=====================                                                                                                                                                 |  12%Error in xj[i] : invalid subscript type 'list'

I have checked that the same bug happened more or less one year ago and it was realted to an issue in SG_info internal function. Is it still present? I am running R 3.6.1, EvenPointer 2.4.0 installed from Bioconductor.

Thanks! All the best Luca

biointf commented 4 years ago

Sorry, I forgot the events I would have visualized:

> Events [ grep ("^NRG3$", Events$`Gene name`, perl=T),,drop=FALSE]
                  Gene name             Event Type     Genomic Position Splicing Z Value Splicing Pvalue Delta PSI
TC1000008253.hg_4      NRG3          Cassette Exon 10:82979120-82985098    -1.748535e+00      0.08037139 1.0425385
TC1000008253.hg_2      NRG3 Alternative First Exon 10:82738577-82865411    -1.117980e-01      0.91098359 0.7609727
TC1000008253.hg_3      NRG3          Cassette Exon 10:82951571-82958949    -9.199905e-02      0.92669879 0.7896287
TC1000008253.hg_8      NRG3          Cassette Exon 10:82166826-82358739    -6.025522e-06      0.99999519 0.8302705
TC1000008253.hg_5      NRG3          Complex Event 10:81878095-82358739    -4.983201e-10      1.00000000 0.3147105
TC1000008253.hg_7      NRG3          Complex Event 10:81878020-82166727     7.460721e-11      1.00000000 0.4116923
TC1000008253.hg_1      NRG3 Alternative First Exon 10:81875314-82358739    -2.149246e-12      1.00000000 0.8409849

> EventPointer_IGV(Events = Events [ grep ("^NRG3$", Events$`Gene name`, perl =TRUE),,drop=FALSE],
+                  input = "AffyGTF",
+                  inputFile = ClariomD.GTF,
+                  PSR = PSRProbes,
+                  Junc = JunctionProbes,
+                  PathGTF = getwd(),
+                  EventsFile = EventsFound,
+                  microarray = array)
Creating SG Information...Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK

Reading Information On Probes...Done
Indexing Genes and Probes...Done
  |========================                                                                                                                                              |  14%Error in xj[i] : invalid subscript type 'list'
jpromeror commented 4 years ago

Hello @emacgene

Thanks for pointing this issue! Im going to take a look into it and will answer you ASAP.

Best,

Juan Pablo

jpromeror commented 4 years ago

@emacgene

I've checked both the code and the files and im pretty sure I've found the problem.

As you stated before, we updated the IGV function to solve the issue, however this fix wasn't, applied to the files in the dropbox link. The problem is caused by an event, identified by the current version of EventPointe, but the EventsFound.txt file wasn't updated correctly.

Im currently creating all the files and will let you know as soon as they are available for download.

You might need to re-run the aroma pipeline with the new CDF (hope this isn't a problem for you).

Im very sorry for this issue as it is a problem on our side.

Best regards, Juan Pablo

P.D I will leave the issue open until the files are available and I will give you a direct link for download.

biointf commented 4 years ago

Thank a lot, Juan Pablo, for your support. Let me say that anyway EP is a great tool. I look forward to receiving updates from you. All the best, Luca

Il giorno ven 17 apr 2020 alle ore 23:20 Juan Pablo Romero < notifications@github.com> ha scritto:

@emacgene https://github.com/emacgene

I've checked both the code and the files and im pretty sure I've found the problem.

As you stated before, we updated the IGV function to solve the issue, however this fix wasn't, applied to the files in the dropbox link. The problem is caused by an event, identified by the current version of EventPointe, but the EventsFound.txt file wasn't updated correctly.

Im currently creating all the files and will let you know as soon as they are available for download.

You might need to re-run the aroma pipeline with the new CDF (hope this isn't a problem for you).

Im very sorry for this issue as it is a problem on our side.

Best regards, Juan Pablo

P.D I will leave the issue open until the files are available and I will give you a direct link for download.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jpromeror/EventPointer/issues/11#issuecomment-615468871, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCP7ANXBD5WAJUHYAXJWHDRNDB3PANCNFSM4MK34RRQ .

jpromeror commented 4 years ago

Hello Luca,

Here is the link to download the correct CDF and EventsFound.txt files. https://we.tl/t-zsCpQXeNU5

I will update the dropbox as soon as you notify me that everything ran correctly.

Best regards!

Juan Pablo

biointf commented 4 years ago

Hello Juan Pablo, thanks a lot! I'm going to rerun it. It will take a while, around 16 hrs, to run aroma, there are quite a lot of cases in the dataset. I will let you know ASAP thanks again Luca

Il giorno lun 20 apr 2020 alle ore 16:36 Juan Pablo Romero < notifications@github.com> ha scritto:

Hello Luca,

Here is the link to download the correct CDF and EventsFound.txt files. https://we.tl/t-zsCpQXeNU5

I will update the dropbox as soon as you notify me that everything ran correctly.

Best regards!

Juan Pablo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jpromeror/EventPointer/issues/11#issuecomment-616594971, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCP7AL5N7XUZY3BL6OOK4TRNRMWVANCNFSM4MK34RRQ .

biointf commented 4 years ago

Here I am. I got something wrong dunring aroma procedure:

[2020-04-20 20:50:43] Exception: Range of argument 'units' is out of range
[1,130249]: [1,171994]

  at #11. getNumerics.Arguments(static, ..., asMode = "integer", disallow =
disallow)
          - getNumerics.Arguments() is in environment 'R.utils'

  at #10. getNumerics(static, ..., asMode = "integer", disallow = disallow)
          - getNumerics() is in environment 'R.utils'

  at #09. getIntegers.Arguments(static, x, ..., range = range, .name =
.name)
          - getIntegers.Arguments() is in environment 'R.utils'

  at #08. getIntegers(static, x, ..., range = range, .name = .name)
          - getIntegers() is in environment 'R.utils'

  at #07. getIndices.Arguments(static, ...)
          - getIndices.Arguments() is in environment 'R.utils'

  at #06. getIndices(static, ...)
          - getIndices() is in environment 'R.utils'
          - originating from '<text>'

  at #05. Arguments$getIndices(units, max = nbrOfUnits(this))
          - Arguments$getIndices() is local of the calling function

  at #04. groupUnitsByDimension.AffymetrixCdfFile(cdf, units = unitsTT,
              verbose = less(verbose, 50))
          - groupUnitsByDimension.AffymetrixCdfFile() is in environment
'aroma.affymetrix'

  at #03. groupUnitsByDimension(cdf, units = unitsTT, verbose =
less(verbose,
              50))
          - groupUnitsByDimension() is in environment 'aroma.affymetrix'

  at #02. fit.ProbeLevelModel(plmEx, verbose = verbose)
          - fit.ProbeLevelModel() is in environment 'aroma.affymetrix'

  at #01. fit(plmEx, verbose = verbose)
          - fit() is in environment 'aroma.core'

 Error: Range of argument 'units' is out of range [1,130249]: [1,171994]

And the whole matrix is effectively null ("0" value everywhere). Indeed, when I copied the file in the dir I wondered whether everything was ok with ClariomD.cdf, since my old CDF was ~150MB, more than 1.5-fold sized than the new one. I do not know if this fact might be related to the issue and it could help you. Thanks again. Luca

Il giorno lun 20 apr 2020 alle ore 18:34 luca agnelli luca.agnelli@unimi.it ha scritto:

Hello Juan Pablo, thanks a lot! I'm going to rerun it. It will take a while, around 16 hrs, to run aroma, there are quite a lot of cases in the dataset. I will let you know ASAP thanks again Luca

Il giorno lun 20 apr 2020 alle ore 16:36 Juan Pablo Romero < notifications@github.com> ha scritto:

Hello Luca,

Here is the link to download the correct CDF and EventsFound.txt files. https://we.tl/t-zsCpQXeNU5

I will update the dropbox as soon as you notify me that everything ran correctly.

Best regards!

Juan Pablo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jpromeror/EventPointer/issues/11#issuecomment-616594971, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCP7AL5N7XUZY3BL6OOK4TRNRMWVANCNFSM4MK34RRQ .

--

Luca Agnelli Pathology Unit 2 IRCCS National Cancer Institute G. Venezian, 1 - 20133 Milan phone +390223902944 fax +390223902877 luca.agnelli@istitutotumori.mi.it luca.agnelli@unimi.it

jpromeror commented 4 years ago

Hi Luca,

I've run the pipeline and generated all the files with no problem at all. It seems that the file wasn't transferred correctly (the CDF file is ~92MB).

Let's try it again: https://we.tl/t-5YEpKlmigh

It should work correctly this time.

Best,

Juan Pablo

biointf commented 4 years ago

Just to check that I operated correctly: I have just to replace only the clariomD.cdf within the aroma annotationData dir and rerun the aroma procedure? This is the cdf structure:


> print(cdf)
AffymetrixCdfFile:
Path: annotationData/chipTypes/ClariomD
Filename: ClariomD.cdf
File size: 87.71 MiB (91972184 bytes)
Chip type: ClariomD
File format: v4 (binary; XDA)
Dimension: 2572x2680
Number of cells: 6892960
Number of units: 130249
Cells per unit: 52.92
Number of QC units: 0

Is it ok?

jpromeror commented 4 years ago

Yes! Just replace it.

However, It is better if you delete all the previously generated files with the aroma procedure. As both CDFs have the same name, it can cause aroma to "think" that the analysis is already done.

Just start over and everything should run with no problem at all!

biointf commented 4 years ago

Surely, I did it. Every time I run the analysis I cancelled all the files in the dirs "plmData", "probeData", "qcData" and "annotationData" unless the CDF, I retain only rawData. However, I got this again:

20200421 17:33:31|   Pathname: plmData/sMMclariomD,NEBC,rma,QN,RMA/ClariomD/PC2018-031,chipEffects.CEL
20200421 17:33:31|   Found indices cached on file
20200421 17:33:31|   Reading data for these 171994 cells...
Error in affxparser::readCel(...) : 
  Argument 'indices' is out of range [1,391250].

it looks like the CDF led to data with different indices from what expected. May I ask you to run the analysis with some CEL files of my experiment? You could find three of them here: https://unimibox.unimi.it/index.php/s/EXwjXzZdfDFeSd4 Just to understand if the issue stands in my aroma package. Thanks again!

biointf commented 4 years ago

Surely, I did it. Every time I run the analysis I cancelled all the files in the dirs "plmData", "probeData", "qcData" and "annotationData" unless the CDF, I retain only rawData. However, I got this again:

20200421 17:33:31|   Pathname: plmData/sMMclariomD,NEBC,rma,QN,RMA/ClariomD/PC2018-031,chipEffects.CEL
20200421 17:33:31|   Found indices cached on file
20200421 17:33:31|   Reading data for these 171994 cells...
Error in affxparser::readCel(...) : 
  Argument 'indices' is out of range [1,391250].

it looks like the CDF led to data with different indices from what expected. May I ask you to run the analysis with some CEL files of my experiment? You could find three of them here: unimibox.unimi.it/index.php/s/EXwjXzZdfDFeSd4 Just to understand if the issue stands in my aroma package. Thanks again!

biointf commented 4 years ago

Surely, I did it. Every time I run the analysis I cancelled all the files in the dirs "plmData", "probeData", "qcData" and "annotationData" unless the CDF, I retain only rawData. However, I got this again:

20200421 17:33:31|   Pathname: plmData/sMMclariomD,NEBC,rma,QN,RMA/ClariomD/PC2018-031,chipEffects.CEL
20200421 17:33:31|   Found indices cached on file
20200421 17:33:31|   Reading data for these 171994 cells...
Error in affxparser::readCel(...) : 
  Argument 'indices' is out of range [1,391250].

it looks like the CDF led to data with different indices from what expected. May I ask you to run the analysis with some CEL files of my experiment? You could find three of them here: unimibox.unimi.it/index.php/s/EXwjXzZdfDFeSd4 Just to understand if the issue stands in my aroma package. Thanks again!

jpromeror commented 4 years ago

Luca,

I ran everything with your cel files and everything went perfectly.

library(EventPointer)
library(aroma.affymetrix)

setwd("/Users/jpromero/Desktop/ClariomD/Clariom/CDF/aroma/")

verbose <- Arguments$getVerbose(-8);
timestampOn(verbose);
projectName <- "Luca"
cdfGFile <- "ClariomD"
cdfG <- AffymetrixCdfFile$byChipType(cdfGFile)
cs <- AffymetrixCelSet$byName(projectName, cdf=cdfG)
bc <- NormExpBackgroundCorrection(cs, method="mle", tag=c("*","r11"));
csBC <- process(bc,verbose=verbose,ram=20);
qn <- QuantileNormalization(csBC, typesToUpdate="pm");
csN <- process(qn,verbose=verbose,ram=20);
plmEx <- ExonRmaPlm(csN, mergeGroups=FALSE)
fit(plmEx, verbose=verbose)
cesEx <- getChipEffectSet(plmEx)
> cesEx
ExonChipEffectSet:
Name: Luca
Tags: NEBC,mle,r11,QN,RMA
Path: plmData/Luca,NEBC,mle,r11,QN,RMA/ClariomD
Platform: Affymetrix
Chip type: ClariomD,monocell
Number of arrays: 3
Names: PAMSC17-103, PAMSC17-91, PAMSC17-98 [3]
Time period: 2020-04-21 18:43:06 -- 2020-04-21 18:43:06
Total file size: 11.20 MiB
Parameters: {}

There is one message in your output that might be the problem:

20200421 17:33:31| Found indices cached on file

aroma creates another set of files in the annotationData directory. Try to remove them too and start with a fresh copy of the CDF.

Best regards,

Juan Pablo

biointf commented 4 years ago

Ok Juan Pablo, it works! Don't worry for ClariomD annotations, I'll manage them post-procedure in the resulting file. Thank you very much for your promptness, you've been very kind! Hope our data will lead to interesting results. All the best, Luca (I deleted the last two messages because unuseful, if you wnt to retain the 3D for future queries)

jpromeror commented 4 years ago

Great news!

Aren't the annotations in the latest EventsFound.txt I sent you? If not, let ,me know and can send them easily.

Best,

Juan Pablo

biointf commented 4 years ago

Yes, they are, this is the header output of the file that you send me:

Affy.Gene.Id Gene.Name Event.Number Event.Type Genomic.Position Path.1

Path.2 Path.Reference Probes.P1 Probes.P2 Probes.Ref TC0100006432.hg DDX11L1 1 Retained Intron 1:12057-12179 1-2,2-2,2-3 1-3 1-1,3-3,5-5,12-12 1029016,1962807,4779955,5459798,5635575,6508497 4956031,5234850,5375978,6705329 136147,2363351,2851978, 4847126,5870089,6424166,2105030,2219973,2753304,4372233,4595488,4905409, 330461,841343,1160423,3811058,5015729,6497650,1766954, 5273608,5508009,5774072,6044336,6180535,1510543,1867376,3539395,3593140, 6196802,6242444,2443267,3568009,3913420,4423936,4773931,5306321 TC0100006432.hg DDX11L1 2 Alternative 3' Splice Site 1:12227-12613 3-4,4-4,4-5 3-5 1-1,3-3,5-5,12-12 2564693,4466835,5673946,6573418 1036074,1465214,2445458,6437862 136147,2363351,2851978, 4847126,5870089,6424166,2105030,2219973,2753304,4372233,4595488,4905409, 330461,841343,1160423,3811058,5015729,6497650,1766954, 5273608,5508009,5774072,6044336,6180535,1510543,1867376,3539395,3593140, 6196802,6242444,2443267,3568009,3913420,4423936,4773931,5306321

Differently from previous times, however, the output of "Events" object was like this:

head (Events) Gene name Event Type Genomic Position Splicing Z Value Splicing Pvalue Delta PSI TC1000008253.hg_6 TC1000008253.hg Cassette Exon 10:82979120-82985098 -1.8402854 0.06572635 1.0454342 TC0900007617.hg_3 TC0900007617.hg Alternative First Exon 9:75857437-75879241 -0.6192826 0.53573020 1.0399301 TC0900007617.hg_2 TC0900007617.hg Alternative Last Exon 9:75857437-75879241 -0.6027946 0.54664536 0.9731911 TC0X00010060.hg_8 TC0X00010060.hg Complex Event X:73827567-73843358 -0.5637497 0.57292452 0.9100664 TC0600014321.hg_1 TC0600014321.hg Alternative Last Exon 6:85505496-85507233 0.4699180 0.63841360 0.0000000 TC0500009072.hg_15 TC0500009072.hg Complex Event 5:149845914-149847458 -0.4583252 0.64671884 1.0878907

that is, TCxxx.hg probes were included instead of the gene name. No problem for me to merge them post-analysis; it's just a matter of convenience having them already in the output.

Thanks a lot Luca

Il giorno gio 23 apr 2020 alle ore 16:08 Juan Pablo Romero < notifications@github.com> ha scritto:

Great news!

Aren't the annotations in the latest EventsFound.txt I sent you? If not, let ,me know and can send them easily.

Best,

Juan Pablo

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/jpromeror/EventPointer/issues/11#issuecomment-618415289, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCP7AMREZSSZXMJEYMWL2DROBDU3ANCNFSM4MK34RRQ .

--

Luca Agnelli Pathology Unit 2 IRCCS National Cancer Institute G. Venezian, 1 - 20133 Milan phone +390223902944 fax +390223902877 luca.agnelli@istitutotumori.mi.it luca.agnelli@unimi.it