Closed atakanekiz closed 5 years ago
Hi, @atakanekiz. The data downloaded by GEOquery are taken directly from the GEO repository without manipulation. In general, the data at GEO are taken directly from the submitters and are often normalized, but there is no guarantee that is the case. Also, the normalization methods (if at all) will vary from study to study. Often, the data processing approaches are detailed at GEO and that is the easiest way to determine what has been done. In some cases, you may have to go back to the publication or even to email the authors.
Where raw data are available at GEO, you can also access those and process yourself if you want to thoroughly control the preprocessing.
Hope that helps.
That's very helpful, thank you very much.
Best, Atakan
Hello,
Thanks for this very helpful package. I have a question about the nature of the data being downloaded by the
getGEO
function. I've analyzed a few datasets and when I plot the expression values of all the genes per sample I usually see similar distributions suggesting the data were normalized as seen below (example from GSE65218 gene expression microarray):But this isn't always true as seen here (example from protein mircoarray GSE25755):
Median expression values are still not far away from each other which might indicate that some sort of normalization is applied. in the latter case as well. I just want to make sure that the data I download by using the
getGEO
function is always the normalized ready-to-analyze data. I'm using the default arguments to prepare theExpressionSet
object:Thanks so much for the help.
Best, Atakan