sr320 / LabDocs

Roberts Lab Documents
http://sr320.github.io/LabDocs/
9 stars 17 forks source link

Trouble using aggregate function in R #639

Closed yaaminiv closed 7 years ago

yaaminiv commented 7 years ago

I have a spreadsheet with protein names and associated peak areas, but each protein transition has it's own row:

screen shot 2017-06-13 at 12 42 17 am

I want to average the peak area for each transition, so I have one row in my spreadsheet per protein. I'm using the aggregate function in R to average peak areas for each protein. I'm using na.rm and na.action to handle all of the N/A values in my spreadsheet, based on this stack overflow thread.

averageProteinAreas <- aggregate(proteinAreas[-1], proteinAreas[1], mean, na.action = na.omit, na.rm = TRUE)

However, I just get a spreadsheet filled with N/As and the warning message "1: In mean.default(X[[i]], ...) : argument is not numeric or logical: returning NA"

screen shot 2017-06-13 at 12 45 21 am

Not really sure how to fix this/what the error message is actually saying. Any help would be much appreciated!

seanb80 commented 7 years ago

Just want to make sure I understand what you're wanting, you're wanting average peak area by protein, right? So the mean of each row? If so, you can use the rowMeans() function, passing the n x (n-1) dimension data frame, without the protein labels.

Off the cuff it would look something like proteinAverages <- rowMeans(proteinAreas[, 2:ncols(proteinAreas), na.rm= TRUE)

I'll look in to why aggregate wasn't working once I get my laptop out.

On Jun 13, 2017, at 12:46 AM, Yaamini Venkataraman notifications@github.com wrote:

I have a spreadsheet with protein names and associated peak areas, but each protein transition has it's own row:

I want to average the peak area for each transition, so I have one row in my spreadsheet per protein. I'm using the aggregate function in R to average peak areas for each protein. I'm using na.rm and na.action to handle all of the N/A values in my spreadsheet, based on this stack overflow thread.

averageProteinAreas <- aggregate(proteinAreas[-1], proteinAreas[1], mean, na.action = na.omit, na.rm = TRUE) However, I just get a spreadsheet filled with N/As and the warning message "1: In mean.default(X[[i]], ...) : argument is not numeric or logical: returning NA"

Not really sure how to fix this/what the error message is actually saying. Any help would be much appreciated!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

seanb80 commented 7 years ago

Ah, I looked in to it a little further and just realized it's because your NA's are being interpreted as strings. R expects NAs to look like "NA" not "#N/A" as you have them in your sheet.

You can read them in with something like

proteinAreas <- read_csv("~/Documents/RobertsLab/work/2017-06-10-protein-areas-only-error-checked.csv", na = "#N/A")

and it looks like your aggregate command works fine.

On Tue, Jun 13, 2017 at 5:44 AM, Sean Bennett seanb80@uw.edu wrote:

Just want to make sure I understand what you're wanting, you're wanting average peak area by protein, right? So the mean of each row? If so, you can use the rowMeans() function, passing the n x (n-1) dimension data frame, without the protein labels.

Off the cuff it would look something like proteinAverages <- rowMeans(proteinAreas[, 2:ncols(proteinAreas), na.rm= TRUE)

I'll look in to why aggregate wasn't working once I get my laptop out.

On Jun 13, 2017, at 12:46 AM, Yaamini Venkataraman < notifications@github.com> wrote:

I have a spreadsheet http://owl.fish.washington.edu/spartina/DNR_Skyline_20170524/2017-06-10-protein-areas-only-error-checked.csv with protein names and associated peak areas, but each protein transition has it's own row:

[image: screen shot 2017-06-13 at 12 42 17 am] https://user-images.githubusercontent.com/22335838/27071313-3adc834c-4fd1-11e7-9f2e-52103e638944.png

I want to average the peak area for each transition, so I have one row in my spreadsheet per protein. I'm using the aggregate function in R to average peak areas for each protein. I'm using na.rm and na.action to handle all of the N/A values in my spreadsheet, based on this stack overflow thread https://stackoverflow.com/questions/17737174/blend-of-na-omit-and-na-pass-using-aggregate-in-r .

averageProteinAreas <- aggregate(proteinAreas[-1], proteinAreas[1], mean, na.action = na.omit, na.rm = TRUE)

However, I just get a spreadsheet filled with N/As and the warning message "1: In mean.default(X[[i]], ...) : argument is not numeric or logical: returning NA"

[image: screen shot 2017-06-13 at 12 45 21 am] https://user-images.githubusercontent.com/22335838/27071391-9d789ed2-4fd1-11e7-8b61-5db1d7146cb9.png

Not really sure how to fix this/what the error message is actually saying. Any help would be much appreciated!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sr320/LabDocs/issues/639, or mute the thread https://github.com/notifications/unsubscribe-auth/AKIJeuownjnvYMcFUjAIpOlOCQ2gDecwks5sDj5ngaJpZM4N4G7Q .

yaaminiv commented 7 years ago

ahh whoops. thanks @seanb80!