jackba / arctos

Automatically exported from code.google.com/p/arctos
0 stars 0 forks source link

Higher Geography is goofy. #193

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. replace "Co." with "County"
2. Fix any other strange abbreviations
3. Merge with UAM's geography

Original issue reported on code.google.com by dust...@gmail.com on 23 Jan 2009 at 11:39

GoogleCodeExporter commented 9 years ago
I think this is done, right? Can this be closed?

Original comment by carla...@gmail.com on 16 Apr 2009 at 6:15

GoogleCodeExporter commented 9 years ago
How about we just remove the MVZ, switch some labels, and leave it open? Higher 
Geog
is still a mess.

Prov. is prevalent, as in Africa, Angola, Benguela Prov.

Same for Dept. and Depto, as in Africa, Ivory Coast, Dept. Abidjan or Central
America, El Salvador, Depto. Cabanas

There is still lots of crazy not-geography "averaging": 
Africa, Cameroon, Nord Prov.
Africa, Cameroon, Nord-Ouest Prov.
Africa, Cameroon, Ouest Prov.

There are (perhaps appropriately?) African Counties. This one caught my eye 
because
of the slashie: Africa, Guinea, B/kama County

I don't know what this is: 
Central America, Honduras, Depto. Islas de la Bahia, Islas de la Bahia
Central America, Honduras, Depto. Islas de la Bahia, Islas de la Bahia, Isla de 
Roatan

Or this: South America, Chile, Metropolitan Region (=Region Metropolitana de 
Santiago)
South America, Chile, Region I (=Region de Tarapaca)
South America, Chile, Region II (=Region de Antofagasta)
South America, Chile, Region III (=Region de Atacama)
South America, Chile, Region IV (=Region de Coquimbo)
South America, Chile, Region IX (=Region de la Araucania)

Don't forget the non-geological continents (along with some political changes, 
which
we have no actual way of handling):
Eurasia, Russia
Eurasia, U.S.S.R.

....and so on and so forth.

OK, one more, but an easy one:
no higher geography recorded
no specific locality
unknown

Original comment by dust...@gmail.com on 17 Apr 2009 at 12:11

GoogleCodeExporter commented 9 years ago
agreed, still needs mega work. this is a better representation of the issue. 
thanks.

Original comment by carla...@gmail.com on 17 Apr 2009 at 12:29

GoogleCodeExporter commented 9 years ago
A noble cause, but how will we know when this issue is closed? For years, I put 
lots
of effort into cleaning UAM's higher geography. More inconsistencies (at best) 
keep
getting added. I want to sweep lots of this legacy noise into 
verbatim_locality, put
serious effort into georeferencing, and abandon string matches against hopeless
vocabulary ASAP. Some of this is original data, but, in the quest for 
consistent (and
even particular bureaucratic) search criteria, an unknown amount is interpreted 
after
the fact.

Original comment by gordon.jarrell on 17 Apr 2009 at 5:31

GoogleCodeExporter commented 9 years ago
This is one of those things that we as a community need to prioritize. Is it 
worth
trying to clean up what we have, or cleaning up the obvious parts (the {abbr.} 
bits
would be fairly easy to get), or ignoring this altogether in the hope that 
we'll have
a locality service in the future?

Perhaps we should revisit who's allowed to alter geography - these things didn't
magic themselves in. Current users with manage_geography are:

uam> select GRANTEE from DBA_ROLE_PRIVS where GRANTED_ROLE='MANAGE_GEOGRAPHY';

GRANTEE
------------------------------
BRANDY
DLM
VOLEGUY
PDRUCKEN
ANDRES_LOPEZ
MKOO
PATTON
ATROX
JMALANEY
CCICERO
CINDY
GORDON
JLDUNNUM
LAM
TUCO
AHOPE

Original comment by dust...@gmail.com on 17 Apr 2009 at 6:34

GoogleCodeExporter commented 9 years ago
We will have some leverage on operators who require "Bureau of Land Managment 
Soggy
Meadows Catepillar Refuge and Management Area" when we can tell them to tell 
BLM to
give us GIS-shape files for their singular view of the planet's surface.  In the
meantime, our users need what they need, or they need what they think they 
need.  

I see names there I'd love to subtract, but there would be at least hard 
feelings. 
On the other hand, with the addition of MVZ's relatively cosmopolitan records, 
we
could start to stabilize as we approach global coverage.

Original comment by gordon.jarrell on 17 Apr 2009 at 7:35

GoogleCodeExporter commented 9 years ago
Then I propose eliminating everything except higher_geog from table 
geog_auth_rec. We
should allow people anything they think they need, rather than allowing them 
anything
they think they need as long as they can cram it into our arbitrary categories, 
if
that is the goal. We're currently pretending to maintain some sort of authority 
while
not actually doing so, and that confuses users and limits access to data. I 
think
users and operators would be happier if we either dropped the pretenses. We're
demonstrably unable to maintain actual authority given the current table 
structure.

Original comment by dust...@gmail.com on 17 Apr 2009 at 7:55

GoogleCodeExporter commented 9 years ago
Adding Social tag - AC needs to prioritize this.

Original comment by dust...@gmail.com on 9 Feb 2010 at 1:37

GoogleCodeExporter commented 9 years ago
I'm copying Michelle on this thread. In the absence of a locality service, 
could we
make a link from the "Create Higher Geography" form
(http://arctos.database.museum/Locality.cfm?action=newHG) - also find 
geography? - to
the document that Michelle put together for standardized names? I for one have 
no
idea where to find that, and it would be useful for users entering new names, 
and
also for cleaning up some of the messy ones. I know that the funky Chile names 
are
our's, but not sure what those should be. Looking up Chile's subdivisions on 
that doc
would be helpful.

Original comment by carla...@gmail.com on 8 Mar 2010 at 10:22

GoogleCodeExporter commented 9 years ago
We seem to have all lost interest in this, and it may not matter in light of 
2012 locality changes. AC?

Original comment by dust...@gmail.com on 3 Jul 2012 at 2:55

GoogleCodeExporter commented 9 years ago
I haven't lost interest in it, but I might have despaired of it.  We will still 
be searching a lot of (maybe most) geography by string matches against strings 
applied by a hodge-podge of operators, correct?  This is not just Arctos's 
problem.  I would be willing to explore a proposal for development of a 
community-wide fix.

Original comment by gordon.jarrell on 5 Jul 2012 at 3:44

GoogleCodeExporter commented 9 years ago
--We will still be searching a lot of (maybe most) geography by string matches 
against strings applied by a hodge-podge of operators, correct? 

Maybe. We could (pending the Google proposal) search against, or also against, 
service-supplied strings now.

--This is not just Arctos's problem.  I would be willing to explore a proposal 
for development of a community-wide fix.

No idea what that means - who else can access Service data but has no 
geospatial capability? Maybe geospatial capability is irrelevant - not really 
sure. My only interest in higher geog is for use as a "standard" in my cleaning 
service, and to be useful for that we need a singular assertion for any place 
(eg, not with and without island_group, etc.).

Original comment by dust...@gmail.com on 8 Jul 2012 at 3:46

GoogleCodeExporter commented 9 years ago
Questions:
By "service-supplied strings," you mean your higher-geog service, or do you
mean an external service, like that Berkeley thing?
Don't get the "maybe."  Yes or no woud be clearer.
"Google proposal" is fuzzy in my mind.  I thought Link was asking Google to
extend access to maps.  If there's more, it went by me.

- What I meant, and I'm not certain I'm correct, is that most or all other
higher-geographic queries on biodiversity data rely on string-matches to
essentially collector-supplied strings.  Or in other words, the first line
of my message applies to more than Arctos.  If so, a solution might be
sought as a supplement to VertNet, or be a stand-alone proposal on its own
merits.
- At one point, the singular-assertion standard would never have passed
political muster: different collections, and perhaps different disciplines
had (or still have) their own ideas about what their users want to match in
higher geography.  We could push for a singular-assertion standard, but we
would need a huge clean-up of the legacy.  And, even if there was agreement
in principle, the particulars could still inflame passion and become
protracted.  On top of that, there are all the difficulties we've
experienced with standardizing other vocabularies; namely that the
vocabulary turns out to be unexpectly vague in the first place.
- My take on standardization is, been there, done that.  It didn't even
approximately work.  Shape-assigned tags are more scalable, if we have a
service to which we can comfortably add shapes, especially bureaucratic
constructions.

Original comment by gordon.jarrell on 8 Jul 2012 at 6:19

GoogleCodeExporter commented 9 years ago
service-supplied strings = data available from something like 
http://maps.googleapis.com/maps/api/geocode/json?latlng=23,-82&sensor=false

maybe==it's a decision we need to make

the proposal is to extend our access to google services - it's been submitted

the nice thing about using service-supplied data for query is that the curators 
can keep on doing whatever ridiculous thing makes them happy, but at the same 
time we can give users tools with which to find specimens

not really that interested in solving problems for anyone else

I think controlling what's acceptable for geography is well within the mission 
of the AC

I see no evidence that anyone's attempted to standardize anything about 
geography, at least not with some firm goal in mind, and here's another place 
where "do everything" ends up doing nothing. I have clear functional 
requirements, and this is a tractable problem.

Original comment by dust...@gmail.com on 8 Jul 2012 at 6:33

GoogleCodeExporter commented 9 years ago
Okay.  You wanted to close the issue, right?  Okay by me.  Wait and see
what falls out of everything else, then make new issues.  Can't see what
googleapis does, but am curious to know if we can get or provide shapes
such as "Pacific International Fishing Zone IV," etc.

Original comment by gordon.jarrell on 8 Jul 2012 at 9:16

GoogleCodeExporter commented 9 years ago
The amended issue (geog==mess) is still valid even if the reasons for cleaning 
up may have changed. I'm happy to keep this around if there's still a chance of 
suckering someone into fixing the data.

Googleapis returns strings that, at various levels, describe a point. It tends 
to be political - country, county, etc. You'll probably have to write your own 
service to find your fishing zone. 

Original comment by dust...@gmail.com on 9 Jul 2012 at 5:17

GoogleCodeExporter commented 9 years ago
Either way.  The goofy data will be there as a reminder!  If the AC agrees
that higher-geog strings with the same meaning can be fixed across
collections without row-by-row consultations, folks like me might slay them
as we find them.  On the other hand, we could build a look-up/replace
spreadsheet to run against the whole mess.

The political designations are probably the easiest.  Sounds like there
might be room for our own service, but I can wait and see.

Original comment by gordon.jarrell on 9 Jul 2012 at 5:43

GoogleCodeExporter commented 9 years ago
Here's my take:

Everything should be georeferenced, no matter how crudely/imprecisely. 
Otherwise you're stuck with curatorial assertions, and those are mostly useless.

Once it's georeferenced, we can use the service to get standardized geography 
strings. People who can't figure out how to draw a box on a map will use those.

The rest of us will draw boxes on maps when we want specimens from somewhere 
special.

A body of unique higher descriptors can be used in a data cleaning service, and 
those cleaned data can then be used for things like semi-automated 
georeferencing.

Original comment by dust...@gmail.com on 9 Jul 2012 at 5:50

GoogleCodeExporter commented 9 years ago
We then have to reduce the many ways in which the coordinates may be
interpreted as a point location, and I fear there are many.

I'm more ambitious about shapes and their descriptive strings, but when we
get this far, we can get more specific.

I'm happy.

Original comment by gordon.jarrell on 9 Jul 2012 at 6:36

GoogleCodeExporter commented 9 years ago
That's one problem with the google service - knowing when to stop. It usually 
gives you a street address, but it's hard to tell (as a computer - pretty 
obvious to people) when you've got too much precision. The addition of an error 
("show me names of things that this circle is entirely within") would be a huge 
improvement. In the meantime, I suppose the real "collecting point" _could_ be 
the street address, so I'm keeping it (potentially for search - now, it's doing 
nothing). 

Original comment by dust...@gmail.com on 9 Jul 2012 at 6:42