jeffsocal / tidyproteomics

An S3 data object and framework for common quantitative proteomic analyses
https://jeffsocal.github.io/tidyproteomics/
MIT License
36 stars 5 forks source link

Expression analysis missing values cut off? #11

Closed rzieg closed 1 year ago

rzieg commented 1 year ago

Hi Jeff, thanks for the package! When I run the expression analysis it automatically gets rid of proteins with missing values for more than 3/13 of the samples in my data set (DIANN data that I would prefer not to impute) so I'm guessing there's a cut-off ~75%? Is there a way to manually adjust this or get rid of it? Or is something else happening? I'm fairly new to R and couldn't find anything in the documentation but would like a bit more control over the filtering. Thanks!

jeffsocal commented 1 year ago

Interesting - by default I return all the proteins with adequate values regardless of the imputation status, that's for you to filter later, and the option is available in the plot_volcano function. If you examine the example in the documentation [jeffsocal.github.io/tidyproteomics/reference/expression.html], there are proteins being accounted for up to 0.83 or 10/12 imputed. However, there are 2226 un-accounted proteins in the expression analysis - these are proteins with 1 or fewer missing values on one side of the comparison. If you are still having an issue, please email me directly and I can take a look at your data to see whats going on.

jeffsocal commented 1 year ago

Thank you for sharing your data, and pointing out an apparent issue. I tracked this down to an improper check on values missing completely from one group. The issue has been fixed in version 1.6.0, and includes a message on how many observations were dropped due to completely missing data.