melanostomias / rstuff

1 stars 1 forks source link

Randy New Task/Specimen level analysis #3

Open melanostomias opened 8 years ago

melanostomias commented 8 years ago

Randy needs a new task for next steps! Also need the graph with individual specimens for each collection.

(hope I did the workflow on this right to do an issue rather than email)

kevinlove commented 8 years ago

We need to decide how to handle the data that comes back from "dwc:individualCount":

How do you want to represent NA's?

melanostomias commented 8 years ago

There were NAs for invidualCount? If so then I guess omit them....

kevinlove commented 8 years ago

Do you think it's appropriate to exclude a specimen record, in the count of total specimens, if the "dwc:individualCount" field is empty?

melanostomias commented 8 years ago

Check this recrod out: https://www.idigbio.org/portal/records/fdee39cd-9e7f-4fdc-85ad-e1ef7ca56c92

Maybe we need to also include "individualcount" (no caps) because the RAW data shows that as a field too.

On another note we somehow missed MCZ (which I suppose might have combined their collections? Maybe there are other collections as well that did this (AMNH?):

https://www.idigbio.org/portal/recordsets/271a9ce9-c6d3-4b63-a722-cb0adc48863f

kevinlove commented 8 years ago

Your first example is of an iDigBio "Index Field" and should solve some of the problems that I've found in the data:

unique(dd$individualcount) "ca. 50" "3+HEAD" "<400" "21+5" "SERIES" "4+" "1`" "90 + 3" "ca.100" "12 + 7" "ca 237" "35+35" "17+24" "35+1"

The process of indexing these data should make the values actual numeric fields....

kevinlove commented 8 years ago

Ok... I was able to re-run the query and use the index field "individualcount" to make a barplot:

asih-by-specimens.pdf

This plot ignores NA's.... so UF doesn't make it on the list. I'm gonna sort out the Y axis scaling and see if I can come up with a method to handle NA's that's reasonable....

kevinlove commented 8 years ago

I felt bad that I couldn't get the count plot to you before lab meeting, so I've been working on it today. Here is what I have:

This plot just flipped the x axis and colors the record count values: asih-by-specimens-COLOR.pdf

This plot stacks the variables "RecordCount" and "SpecimenCount": asih-by-specimens-STACKED.pdf

Let me know what you think