Closed aazaff closed 1 year ago
As a current tentative design, I have made the following preliminary list of Required, Encouraged, Inappropriate, and optional fields for COLLECTIONS.
Stratigraphic Unit -- The most highly resolved stratigraphic unit - e.g., if your section has a known Group, Formation, Member, Submember, and Bed, you would put the Bed name here. Latitude -- in WGS84 Decimal Degrees Longitude -- in WGS84 Decimal Degrees Name -- a free text name (Required for PBDB) Early Interval -- The earliest geologic interval Late Interval -- The latest geologic interval Protected -- Boolean, is this considered a legally protected site Paleoenvironment -- The paleoenvironment interpretation (indeterminate allowed).
Paleobiology Database Link -- Link to an equivalent PBDB collection Numeric Max Age -- Age in Myrs if associated with specific geochron measurement Numeric Min Age -- Age in Myrs if associated with specific geochron measurement Group -- Geologic Group Formation -- Geologic Formation Member -- Geologic Member Bed -- Geologic Bed Geography Comments -- Notes on how to find the location Stratigraphy Comments -- Notes on the stratigraphic nomeclature used Geologic Comments -- Notes on the lithological context of the section Preservation Comments Notes on the taphonomic situation Collectors -- Names of actual field collectors
collection_no -- would be assigned by pbdb, not our issue record_type -- is automatically a collection, this is a stupid field collection_subset -- although we could support this, it is so rarely used in pbdb I don't see it as worth it collection_aka -- no need for this n_occs -- implicit ref_author -- implicit in references node ref_pubyr -- implicit in references node reference_no -- implicit in reference node cc -- country code, implicit in lat long state -- implicit in lat long county -- impliict in lat long paleomodel -- handled by pbdb paleolng -- handled by pbdb paleolat -- handled by pbdb geoplate -- hanlded by gplates taxonomy comments -- redundant with rest of pbot
Everything else tracked by pbdb but not listed above.
The Current Design described above is in api commits ef0a0936d206d41000df98aabdada7c3190ae38a through 33e5f2d2bf68fdbede9e3af298127828b4337831, and client commits 8982749adf9c86e6775e57821ad3037bca1be56b through 612d9db87a7e024df335681fe3eb7b8faaac7ec0.
I'm leaving the properties at pbotID and name for now.
I think collection_subset could be useful. Taking multiple subsamples (of even a single bed) at a single field site is not an uncommon collection strategy for paleobotany.
For example, I collected an in-situ flora from a laterally continuous tuff layer, and took 26 subsamples of the flora across the exposure. I report in a publication the entire flora all together because it was a living community - but knowing the subsampling is helpful for spatial analyses and various diversity/heterogeneity metrics. So in PBot/PBDB, I would prefer to enter the flora as a collection, and make record of the subset.
Everything else in your list looks good to me!
"...we DO required Linnaean terms of some kind with specimens, which is equivalent to PBDB occurrence"
About requiring linnaean terms for specimens - Do you mean just at some level (e.g., Plantae, or some higher order clade name)? We definitely don't want to require family, genus, or species, right?
Yes... Linnaean at SOME level, not necessarily genus species.
Did we talk previously about having a "Reason for Collection" field? Might be important to know if something was a quantitative census vs. taxonomic collection vs. biostratigraphic investigation (and other categories, too).
@ecurrano I didn't actually know how to interpret "reason for collection"! I would like having the investigations method/purpose info you listed here. We would need to provide clear instructions of what to put in that field, and not to put in that field! I could see someone typing their whole study rationale in that field.
Less specifically than last couple of comments, It sounds like having the flexibility for a collection to mean a myriad of things is necessary to be in line with PBDB and our term collection would port to PBDB. However within PBOT, the highest level of collection could be broken down into sub collections which is dependent on the enterer (we would provide some training set of guidelines).
It does bring up the concern of when a new collection is added as the highest level, but turns out it will later become part of an even higher level collection. But something tells me PBDB has already had experience with this and there is a solution or workaround?
Ellen, Claire, and Rebecca discussed this more on Wednesday. Here is a summary of our discussion, and we look forward to feedback!
Collection node = the fossils collected from a particular hole at a particular point in time by a particular team or individual (inherent in this is that different teams might have different reasons for collection) => we will encourage everyone to enter their data at this resolution.
Super Collection node = amalgamation of all the collections taken from a specific locality (e.g., Colwell Creek Pond, Big Cedar Ridge)
the use of collection & super collection vs sub-collection & collection should match PBDB. We are unsure whether that means PBDB gets the “Collection node” or the “Super Collection node.”
Anything above the level of “Super Collection” will be achieved through querying- e.g., formation-level or specific geographic region.
Allow duplication of nodes so one can adjust minimally Command D:duplicate is our friend!
Req./Rec./etc. for collection properties: pause until we meet with the full team and confirm collection terminology
Also, replying to Dori's comment above. Yes! Maybe we can have a drop-down menu for investigations method/purpose and clear explanation of each category we put in that drop-down menu.
Great, here are some preliminary responses.
Collection node = the fossils collected from a particular
holesample at a particular point in time by a particular team or individual (inherent in this is that different teams might have different reasons for collection) => we willencourageeveryone to enter their data at this resolution.
Super Collection node = amalgamation of all the collections taken from a specific locality (e.g., Colwell Creek Pond, Big Cedar Ridge
The definition of a collection node, as currently written, seems very specific to a certain collecting style and material. Namely, it makes perfect sense for the discrete quarry-style sampling of compression/impression fossils (currently used by all of us and others in more quantitative paleobotany circles!). I am not sure if that definition is inclusive enough of the range of plant materials/modes of preservation/collection styles for the entire community.
For example, nodules containing plants are not collected from holes or quarries, but often surface collected - what constitutes a sample is, by necessity, going to be determined by the researcher based on the realities of the site and their study aims. There are even many situations where plant compressions/impressions are not collected from distinct holes that are separately recorded - for example, when I did some field work with Nacho and Ruben in Argentina, they just collected all the plants found at the site/locality by many people working on different spots on the hillsides; this is common for groups that are sampling for taxonomic diversity and systematics and not heavily into quantitative paleoecology/paleobotany. There are also many collections of specimens obtained from float. I am also not sure how palynology would work with this description - for example, should a core be divided into time/measured increments as collections? And then there are the historical collections, of course, which are generally/rarely recorded as holes or quarries and were likely just collected from large areas. There are more examples that I am not touching on, like collections from coal balls, from lignite mines, and so on. All of this to basically say that the definition of a collection has to be accommodating to the many various types of preservation/specimens, researcher aims, and messy realities of collecting. I think that means ultimately having to rely on researchers to determine the smallest, meaningful, and realistic partition of their specimens/data, beyond the basic guidelines of being collected by a particular individual/group at a particular time (this speaks a bit to Andrew's question #3).
A possible (but incomplete and needs work) modification: Collection node = the fossils collected as a sample at a particular place and point in time by a team or individual. The sample should represent the smallest reasonable division of your data (e.g., a single quarry sample, a surface collection of nodules from a single site, .....[need other examples, but you get the gist!]
I like the concept of a "super collection" node for internal PBot use! I am guessing that the smaller-partitioned "collection" concept is what would pipe to/track better with PBDB.
Dori articulated my thoughts so much more clearly. Yes, there are so many different methods of collection out there. We really want to err on the side of inclusivity.
Relevant notes on this topic from meeting with Mark:
"Collections" are the biggest sticky issue in PBDB -Mark's view is a collection is "temporally-ecologically bound" [note: I like this wording of the concept] -People should conceptualize them as small as they can
-PBDB does not differentiate collection date - so why make a new collection for different dates? Mark's perspective is that there is no point. But all this is dependent on things like, is it the same spot?? Different collectors can make a difference. [note: my opinion would be that different collection methods should warrant making new collections]
-PBDB is very flexible, which is a blessing and a curse
-"you can never go wrong making more collections, only by making too few" -"when in doubt, part it out!!"
Andrew: think about the goal of the database. If it is a constrained scientific goal/aim, then what a collection is should be constrained; if the use of the database is broad/unconstrained, then the definition should be unconstrained.
Regarding specimens and/or names not clearly in a collections or names without attached specimens. Mark's rule of thumb when entering data (for global scale analysis): if he can get down to cenozoic epic and a county, he will enter it. -the data can be in the database at the level that it exists (sometimes crappy), but depending on your analysis and query, it gets filtered out when not relevant!
-for PBot, for filtering based on collection methods see the collections entry form, last tab, called "collecting info". and our system, could say that you must answer these questions! PBDB can add checkboxes or whatnot for us
-From Andrew about nesting collections: we provide in our graph system the ability to have unlimited hierarchy of collections, and the most finely parced goes to PBDB. Mark full-on agrees here. -Use darwincore for terms! There is a conversion table for PBDB to darwincore, Mark will try to find
We will allow unbounded nesting of collections of collections (as we do with states and characters). The only question is whether we separate out the concept of darwind collection_location as a separate node type or just keep it as a collection type but specify it with a property - like how we distinguish between OTU and Specimen descriptions. I am leaning towards the latter, but we have not decided on this yet.
We have decided that we need to add a new type of node called a COLLECTION.
The purposes of this are twofold:
Current Design
Outstanding issues
A collection in the paleobiology database is primarily a group of what we would call OTUs with secondary support for specimens, but a collection in pbot is primarily a group of specimens with secondary support for OTUs. Is this mismatch a problem?(I realised I was being stupid here and this is not a problem because we DO required Linnaean terms of some kind with specimens, which is equivalent to PBDB occurrence)