jackba / arctos

Automatically exported from code.google.com/p/arctos
0 stars 0 forks source link

We suck at recording spatiotemporal data #303

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Collect >1 specimens simultaneously. Enter them into Arctos.
2. Allow their locality_id's or collecting_event_id's to drift apart, as
they tend to do.
3. Try to prove that these things have any spatiotemporal relationship to
one another.
4. Recognize that this is only the most obvious example, and that there are
many more ways in which our current model disallows effective sharing or
spatiotemporal data between specimens, observations, and media. A host and
a parasite may differ only by collecting_method, and we cannot accommodate
that.

We need to reconsider how we relate specimens to spatial and temporal data.
Specimens from the same place and/or time should be findable and
identifiable as such even when they do not share  primary keys, collecting
methods, coordinate determiners, coordinate remarks, previous coordinate
determinations, coordinate precision, or any of the dozens of other things
that must separate them in our model.

I don't believe we can adequately accomplish this without adding GIS
capabilities to Arctos, either as an external service or from within.
Additionally, we need to repartition "geospatial metadata." I suggest the
following partitions as a starting point:

1)Geospatial data (shapes)
2)Attributes of geospatial data (determiner, reference, etc.)
3)Temporal collecting data
4)Event attributes (method, habitat description, etc.)
5)Verbatim assertions (Curator's assigned geography string, collector's
locality description, etc.)
6)Geologic data (Formation, Period, etc.)

I believe solving this issue should be highest priority, perhaps second
only to taxonomy. Issues 13, 89, 115, 193, 225, 240, 260, 263, 268, and 280
would be solved or made solvable by this issue.

Original issue reported on code.google.com by dust...@gmail.com on 11 Jun 2009 at 9:02

GoogleCodeExporter commented 9 years ago
What do you mean by "repartition 'geospatial metadata'"?

Original comment by gordon.jarrell on 17 Jun 2009 at 8:15

GoogleCodeExporter commented 9 years ago
This entire issue is about repartitioning geospatial metadata. We should be 
able to
find and share when-and-where data without needing to memorize random integers, 
poke
holes in our security, or be fluent in historical placenames. We can't do that 
with
the current structure.

We have a completely nonsensical mix of geospatial data (dec_lat), descriptive 
data
(continent_ocean), untransformed data (lat_deg), and stuff about specimens
(collecting_method) scattered around our 4 spatiotemporal tables, and we can't 
keep
track of the important parts without bringing along lots of stuff we don't want.

Original comment by dust...@gmail.com on 17 Jun 2009 at 9:11

GoogleCodeExporter commented 9 years ago
adding social label - this isn't getting fixed without a proposal & more 
discussion

Original comment by dust...@gmail.com on 24 Jul 2009 at 9:56

GoogleCodeExporter commented 9 years ago
From an old email exchange...

Arctos has locality data in 4 tables:

geog_auth_rec
locality
collecting_event
lat_long

That structure, along with being difficult to write code to, doesn't jive very 
well
with how data are collected and used. For example, there are times when it 
would be
hugely beneficial to share everything about a collecting event between 
specimens, but
where collecting method differs - rat caught in Museum Special, worm from that 
rat
"caught" in 100-mesh sieve, etc. Since collecting_method is in 
collecting_event, we
can't do that with one event, meaning we just doubled our chances of mucking up 
the
event/locality connection.

Higher geog forces us to make arbitrary choices, and those end up being
taxon-dependent. I'm pretty sure you could use Arctos to prove that there are no
moose in the "Yukon-Tanana uplands" or no plants in "Game Management Unit 20,"
nevermind that those are largely the same thing. It's even goofier when you 
consider
things like language (Russia or Россия?) or dynamic political boundaries 
(maybe it
should really be Soviet Union - or maybe now Belarus!).

The way in which we store shapes (Point-Radius) is fairly primitive. The circles
around "New Mexico, 8000 feet" and "Yukon River, Alaska" both contain a LOT of
unlikely acres.

We have no way of spatially querying error. I have a fake spatial query widget
implemented, but it only considers points - irrespective of whether error is a 
meter
or a light-year, and it returns an all-or-nothing result set.

Those aren't the only problems, but they're exemplary.

I think the solution is obvious, even if the details aren't: replace the 
"coordinates
as an afterthought" model with a "coordinates as the data" model, and let 
machines
figure out if a given georeference is in Russia, or what was once Russia, or 
has a
chance of being in Russia, or just someplace I can see from here, or whatever. 
(We
can still keep any number of strings describing the coordinates and of course 
the
collector's notation, we just don't have to locate specimens using only those.)

I think the breakdown for storing data is roughly the following:

1)Geospatial data (shapes)
2)Attributes of geospatial data (determiner, reference, as_defined_on_date, 
etc.)
3)Temporal collecting data
4)Event attributes (method, habitat description, etc.)
5)Verbatim assertions (Curator's assigned geography string, collector's locality
description, etc.)
6)Geologic data (Formation, Period, etc.)

Those need more consideration, and can be refined along the way.

It's not clear to me what we can do about storing the actual locality data - I 
think
GIS shapes are straightforward now, and an improvement over our current
(point-radius) method. Probability Surfaces are better still, and at least
storing/serving them, if not creating them, may be immediately available.

Original comment by dust...@gmail.com on 10 Mar 2010 at 8:39

GoogleCodeExporter commented 9 years ago

Original comment by dust...@gmail.com on 5 Aug 2011 at 7:06

GoogleCodeExporter commented 9 years ago

Original comment by dust...@gmail.com on 5 Aug 2011 at 7:13

GoogleCodeExporter commented 9 years ago
Just trying to bring this back to the top of the pile. I'm not going to jump 
all the way in by myself, and MSB Parasites has an immediate funded need to 
separate collecting event (place and time) and collecting method (and source) 
data. If ya'll don't help me come up with a viable model that I can write code 
to like about now-ish I'm going to do something drastic - maybe flatten 
locality data out into something like our "temporary" taxonomy structure.... 

Original comment by dust...@gmail.com on 31 Oct 2011 at 4:01

GoogleCodeExporter commented 9 years ago
This sounds like a major change that is best dealt with in person. Can we 
organize an Arctos 'pow wow' ???

Original comment by carla...@gmail.com on 1 Nov 2011 at 4:56

GoogleCodeExporter commented 9 years ago
We've been ignoring this formally since Jun 11, 2009, and in various other 
capacities since at least the ABQ powwow in 2006, when I think we all realized 
that we needed to do things differently. This isn't something that's going to 
be solved in an afternoon, so it isn't a good candidate for a meeting.

Original comment by dust...@gmail.com on 1 Nov 2011 at 2:58

GoogleCodeExporter commented 9 years ago
I recall talking about this at the ABQ meeting, or maybe a meeting at MVZ, and 
agreeing that collecting method and source are better at level of cataloged 
item rather than collecting event. Do you have the same recollection? What 
exactly does MSB need to do that is different from how we do things now?

Original comment by carla...@gmail.com on 2 Nov 2011 at 6:01

GoogleCodeExporter commented 9 years ago
I'm not sure cataloged item is the right place for that stuff either, although 
a bit of denormalization here wouldn't bother me too much and that's an easy 
partial solution. But locality code is incredibly difficult to write and 
maintain, so I'd really like a more complete solution.

MSB needs to collect hosts and parasites from the same event.

Original comment by dust...@gmail.com on 2 Nov 2011 at 2:36

GoogleCodeExporter commented 9 years ago

Original comment by dust...@gmail.com on 31 Jan 2012 at 9:33

GoogleCodeExporter commented 9 years ago

Original comment by dust...@gmail.com on 26 Jun 2012 at 9:06

GoogleCodeExporter commented 9 years ago

Original comment by dust...@gmail.com on 26 Jun 2012 at 9:07

GoogleCodeExporter commented 9 years ago

Original comment by dust...@gmail.com on 28 Jun 2012 at 4:47

GoogleCodeExporter commented 9 years ago
closing this thing for the psychological benefits - re-open with whatever we 
miss in the v5.2 update

Original comment by dust...@gmail.com on 2 Jul 2012 at 4:37