Open kate-stankiewicz opened 1 year ago
Dear @kate-stankiewicz ,
Thanks for reporting this. It is on my TODO list since a while to catch these warnings early and give a more informative feedback to the user. Depending on where the warnings occur, they might indicate different things. For instance that an event does not have sufficient number of quantified events in a group or that the gene expression or event fold-change contains NaNs. You can ignore these for now, but I will leave the ticket open as a reference (and reminder) for me to improve this.
Best,
Andre
Hi Andre,
Thanks so much for the reply and explanation! I do note that in the test_results_C3gene_unique.tsv files, the column 'log2FC_event_count' contains both 'nan' and 'inf' values for some events. Also, if I look at the mere_graphsC3.confirmed.txt files, I do see that some samples show 'nan' for psi for some events.
In this case, I can still ignore these warnings for now? And perhaps simply exclude events that have NaN values for further analysis? (looking at https://github.com/ratschlab/spladder/issues/124 )
Thanks, Kate
Description
I am trying to run SplAdder on 83 samples using the instructions for Use on large cohorts ( https://spladder.readthedocs.io/en/latest/spladder_cohort.html ). When I get to the testing mode, I have several subgroups of samples to test based on different conditions. Some of the tests failed with the following error message:
raise ValueError(self.msg.format('endog')) ValueError NaN, inf or invalid value detected in endog, estimation infeasible.
These error messages were accompanied by these warnings: /users/kstankie/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: Mean of empty slice. /users/kstankie/anaconda3/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
Based on these warnings, I looked at the distribution of inserted events and found some samples that had very low or zero inserted events while all other samples in the condition group had hundreds or thousands. Example of two samples in the same condition group for a failed test:
Inserted: cassette_exon: 0 intron_retention: 0 intron_in_exon: 0 alt_53_prime: 1 exon_skip: 0 gene_merge: 0 new_terminal_exon: 0
Inserted: cassette_exon: 1357 intron_retention: 706 intron_in_exon: 3552 alt_53_prime: 12145 exon_skip: 17159 gene_merge: 0 new_terminal_exon: 41576
I removed the sample with almost no inserted events and re-ran spladder test. This time, no errors indicating "mean of empty slice" and I did receive output! However, for many of my tests I keep receiving this warning still (it was also present before I removed the offending samples causing the previous error):
users/kstankie/anaconda3/lib/python3.9/site-packages/spladder/spladder_test.py:742: RuntimeWarning: invalid value encountered in subtract
The run finishes and produces output that looks similar to tests that do not contain this RunTimeWarning. So I am not sure what is causing it and if it should raise alarm bells. As mentioned above, for this one dataset, I run several tests with different groups of samples for different conditions and only receive this warning for some of the tests. I can't figure out the reason why some tests receive this warning and not others (it is not just for the tests where I had to remove some samples due to the previous "mean of empty slice" issue). For each of my tests, each condition has 5-7 samples. I do not receive any warnings or errors in any previous steps (for spladder build)...it is only at spladder test where these RunTimeWarnings occur.
Are these warnings a concern or can the they be ignored as long as it finished running and produced output?
Thanks in advance for the help!
What I Did