Closed MiguelCos closed 3 years ago
fit_cub2$count
can't have any zero values in it.
try adding a pseudo count 1 to all values before using spectraCounteBayes
function.
fit_cub2$count = fit_cub2$count +1
.
on the other hand, Is there specific reason that you use splines:ns
before creating model matrix?
is there a difference if you just create design matrix like this:
design_cub <- model.matrix(~factors)
I also looked how the variance~count fit look like in your data using VarianceBoxplot
.
as I can see, the dependence is not as evident. Maybe it is due to the fewer number of proteins.
Hello, many thanks for checking into this.
The tool is now going through but as you noticed it seems that I am not gaining much in terms of detection (of significantly affected genes) capabilities using this adjustment based on spectral counts.
I have indeed a lot of proteins in my dataset that were identified in most samples but not in one or two (I used MaxQuant with match between runs), that leads to a lot of 0s in spectral counts when getting the minimum per row. This, summed with the fact that I have around 500 proteins to work with could be the reason of the small relationship between variance and spectral count numbers.
The reason I used splines::ns
is because I am modelling my data as if it has a time-course behavior. I want to see proteins that show a trend of decreasing or increasing values over-time and I understand that this kind of cubic spline model would help me to get that result while allowing for non-linear increase/decrease. So far I can find a handful of proteins following this behavior, using this model.
When I use design_cub <- model.matrix(~factors)
, I treat my time points as factors (categorical variables). This model (as I understand) detects any kind of effect of time on the expression level. This F test gives me an 'all proteins are affected' result which is not particularly insightful.
Hello,
I am testing the package to analyze a time-course proteomics data set produced by MaxQuant. The output was processed using
MSstats
for protein summarization and normalization and then translated into wide format to use it as input inlimma
+DEqMS
. The spectral count data was produced from themsms.txt
file. After following the steps in the vignette to modify thelimma::lmFit
output object to be used as input for the functionDEqMS::spectraCounteBayes()
I am receiving the error as written in the tittle. I am attaching here the necessary files and the code to reproduce the error. Every row with missing values was filtered-out from the expression data set and the same set of proteins were used to created the count-data matrix.I would really appreciate if you could offer any support to understand what can be happening.
annotation_liver.txt logexpr_msstats_norm_liver_wide.txt count_liver_msms.txt
Code below
I also tested:
fit.method
argument, with similar results (errors making reference to missing values).design <- model.matrix(~0+time)
) and setting thecoef_col = 1
.trend
androbust
arguments inlimma::eBayes()
toFALSE
.All of these with the same result.
Many thanks in advance!