hurlbertlab / core-transient

Data and code for NSF funded research on core vs transient species
7 stars 3 forks source link

re-cleaning d207, removing non-quadrats #88

Closed ahhurlbert closed 8 years ago

ahhurlbert commented 8 years ago

@ssnell6

Do a quick check of the unique values of 'quadrat', and how many times they occur: table(dataset$quadrat) You will see several things. First, each block-quadrat combo occurs twice, but the second time it has 'gm2' appended to the end. I re-checked the metadata and a spotcheck indicates that the rows with 'gm2' in the quadrat name are duplicates where the biomass value was converted from grams per 0.04 m^2 to grams per m^2. If this is true, all of these gm2 records can be removed. (Although a look at the propOcc file reveals differences between, e.g., B1Q1 and B1Q1gm2 with occ values differing, and even specieslists differing; so we probably need to better understand what's going on here.)

Second, there are quadrats labeled "Average", "Count", and "Std..Err." which presumably need to get removed.

ssnell6 commented 8 years ago

I thought that we would want the dried biomass values attached to each B1Q1 label for count data rather than one of the original columns. Do you mean that in addition to the columns there are also rows with averages/counts/std errors in the quadrat field?

ahhurlbert commented 8 years ago

Yes, exactly. Type this: table(dataset$quadrat)

ssnell6 commented 8 years ago

removed the avg/count/SE quadrats. I will go through the metadata to figure out whats going on w gm2

ssnell6 commented 8 years ago

removed all gm2 - couldn't distinguish additional differences besides unit changes in the raw datasets.