wenweixiong / MARVEL

38 stars 9 forks source link

AssignModality produces many NaNs #18

Closed yeroslaviz closed 1 year ago

yeroslaviz commented 1 year ago

Hi,

thanks for this great tool. It took a while to create the marvel obeject, but I managed to do so. Now, when trying to assign modalities, I get the warning Warning: NaNs produced many many times.

I know it is just a warning, but i want to make sure, all is good in my data and the way I created the marvel object. Also I want to understand why this warning appears and where are the NaNs are coming from.

thanks in advance for the info

Assa

NaNs

yeroslaviz commented 1 year ago

A second warning I'm getting appears, when running the differential expression step

here it warns about not being able to compute exact p-values with ties. What causes this warning?

P.S. Is there a way t reduce the huge amounts of warnings. One time would be enough

Screenshot 2023-07-05 at 11 53 00

wenweixiong commented 1 year ago

May you share your R object and codes used that generated these warnings via Google Drive please?

yeroslaviz commented 1 year ago

This strange, I have rerun the same command again, without changing anything, but now I don't get this warnings.

If you're still interested, i think I can provide you with the R object you asked for, but the problem seems to have vanished.

thanks anyway.

yeroslaviz commented 1 year ago

I have a different question now. When running the Differential expression analysis for both genes and splicing events I get a huge table with a lot of duplicated entries.

> head(marvel$DE$Exp.Spliced$Table)
             gene_id gene_name      gene_type n.cells.g1 n.cells.g2  mean.g1
1 ENSMUSG00000095595   Fam177a protein_coding         23         24 5.430945
2 ENSMUSG00000095595   Fam177a protein_coding         23         24 5.430945
3 ENSMUSG00000095595   Fam177a protein_coding         23         24 5.430945
4 ENSMUSG00000095595   Fam177a protein_coding         23         24 5.430945
5 ENSMUSG00000095595   Fam177a protein_coding         23         24 5.430945
6 ENSMUSG00000095595   Fam177a protein_coding         23         24 5.430945
   mean.g2     log2fc statistic        p.val    p.val.adj
1 4.905805 -0.5251404        NA 8.866898e-08 0.0004580672
2 4.905805 -0.5251404        NA 8.866898e-08 0.0004580672
3 4.905805 -0.5251404        NA 8.866898e-08 0.0004580672
4 4.905805 -0.5251404        NA 8.866898e-08 0.0004580672
5 4.905805 -0.5251404        NA 8.866898e-08 0.0004580672
6 4.905805 -0.5251404        NA 8.866898e-08 0.0004580672

Is this deliberate? I can filter all these duplication, but it makes the R object really big and the session very slow.

When extracting the complete table with all the entries the file size is 49Gb, when removing all duplicated rows, I am left with 2.9Mb.

I'm attaching the script I'm using here, but it is mostly a copy of your plate-pipeline web page.

I have here the links to two different marvel objects.

The first one was created as was stated in the pipeline -
126Mb in size

The second one was made after the differential analysis on the gene level - 700Mb in size

Let me know if you need anything else

thanks

sharedScript.Rmd.zip

wenweixiong commented 1 year ago

Please may you share your R object after at line 783: save(marvel, file=paste(path, file, sep="")) of sharedScript.Rmd?

yeroslaviz commented 1 year ago

I have shared it. it is in the first link I added above.

here it is again - https://datashare.biochem.mpg.de/s/I7QxDqxqFRnC3H4

yeroslaviz commented 1 year ago

The next error appears when running the IsoSwitch command when trying to assign dynamics to gene splicing events.

Here, I get this error:

Error in `$<-.data.frame`(`*tmp*`, "cor", value = NA) : 
  replacement has 1 row, data has 0

Can you please tell me how to interpret this one? Do I have something missing in the data, or don't I have any significant events?

yeroslaviz commented 1 year ago

This strange, I have rerun the same command without changing anything, but now I don't get this warnings.

If you're still interested, i think I can provide you with the R object you asked for, but the problem seems to have vanished.

thanks anyway.

I have re-run the command again and I got all the NaNs warnings again.

Any ideas why this is happening?

thanks again

yeroslaviz commented 1 year ago

Did you have time to look at the strange behaviour of having many many duplicated entries in the marvel object?

This happens already, when running the CompareValues command:

> marvel$DE$Exp$Table[1:5, ]
             gene_id gene_name gene_type n.cells.g1 n.cells.g2 mean.g1  mean.g2
1 ENSMUSG00000110896   Gm48273       TEC         23         24 5.55221 7.396245
2 ENSMUSG00000110896   Gm48273       TEC         23         24 5.55221 7.396245
3 ENSMUSG00000110896   Gm48273       TEC         23         24 5.55221 7.396245
4 ENSMUSG00000110896   Gm48273       TEC         23         24 5.55221 7.396245
5 ENSMUSG00000110896   Gm48273       TEC         23         24 5.55221 7.396245
    log2fc statistic        p.val   p.val.adj
1 1.844035        NA 8.620796e-12 4.13087e-06
2 1.844035        NA 8.620796e-12 4.13087e-06
3 1.844035        NA 8.620796e-12 4.13087e-06
4 1.844035        NA 8.620796e-12 4.13087e-06
5 1.844035        NA 8.620796e-12 4.13087e-06
wenweixiong commented 1 year ago

Many apologies for the late reply, I was trying to get several conference abstracts off my desk!

Please may you re-share your R object and scripts? I will look into this now.

yeroslaviz commented 1 year ago

No worries about it, I guess you're as busy as I am. I know how it feels.

I have solved the duplication problem. For some reason my GeneFeature matrix contained duplicated entries, which were carried further downstream in the analysis.

But now I'm stuck in an error I can't figure, when calculating the gene splicing dynamics.

I can run the IsoSwitch command and get the results

          cor freq        pct
1 Coordinated   60  3.4149118
2    Opposing   49  2.7888446
3  Iso-Switch 1632 92.8856005
4     Complex   16  0.9106431

but when I try to plot them I get the following error:

> marvel <- PlotValues(MarvelObject=marvel,
+                        cell.group.list=cell.group.list,
+                        feature=gene_id,
+                        maintitle="gene_name",
+                        xlabels.size=7,
+                        level="gene"
+                        )
Error in xj[i] : invalid subscript type 'list'

The link to the R object after running the IsoSwitch() command is below (size ~12Gb):

https://datashare.biochem.mpg.de/s/mXN6GprNpQHChUw

thanks for the help

wenweixiong commented 1 year ago

Great that you managed to solve the gene duplication issue!

Please may you paste the section of your scripts here that defines your cell.group.list object and gene_id object? I suspect the issue lies in defining the cell groups to plot in cell.group.list.

yeroslaviz commented 1 year ago

I think I solve this problem as well.

for some reason the marvel$GeneFeatures subset was saved as a tibble and not as a data.frame. For that reason the gene_id object was empty. This caused this error message.

Thanks for the patience and the help. I'll close this ticket and reopn it (or a new one if needed).

Assa