tubackkhoa / gbif-dataportal

Automatically exported from code.google.com/p/gbif-dataportal
0 stars 1 forks source link

Make State / Province searchable #64

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
There was an exchange over email in relation to this request.  Copied at bottom.

This is a significant amount of work, and should be incorporated in the medium 
term plan for the portal.  It is likely to require a 
full text indexing technology to answer this efficiently, and perhaps a full 
text occurrence index should be built using Lucene and 
deployed using Katta across several blades.  This is a good research topic as 
the index could potentially be built using Hadoop 
mapreduce, and possibly with the data stored in HBase.  Certainly Hadoop would 
be a good technology to build the index.

"Yes, I know all those troubles. But the fact of being free text  
content does not imply that it is unreliable in all countries: most of  
the Spanish providers are already using a standardized way of naming  
the StateProvince field (the way used by Flora Iberica project).

So most of the Spanish records provided by Spanish providers are  
already VERY useful for filtering on that field (so, for some  
purpouses they are MUCH MORE useful than filtering on  
latitude-longitude fields which are lacking in a much higher  
percentage than the StateProvince field).

Also, GBIF.ORG is not be responsible on what content providers serve  
on our StateProvince fields (as you are not responsible on the  
accuracy of the ScientificName field, which is also a free text  
content which not always written in the same way for the same concept).
And I am pretty sure this would make sense at many other countries.

And it is also responsability of the client to take care of using  
several filter strings ("M", "Madrid" and perhaps "Madrid*") when  
looking for plants collected at Madrid province.

Would it be possible that you include stateprovince searchable in GBIF  
web services even making this field "non official", "non documented"  
or something like that? (so you don't give the impression of making  
that field concept "reliable" or "gbif.org trusted")

Thank you very much in advance and thanks for your answer on my other  
questions indeed!!

David

--
David García San León
(digitization & loans control)
Herbario SANT
Facultade de Farmacia
Universidade de Santiago
15782 - Santiago de Compostela
http://www.usc.es/herbario/
Tel. +34 981594488 ext.15022
Fax  +34 981594912"

Original issue reported on code.google.com by timrobertson100 on 13 Oct 2009 at 11:37

GoogleCodeExporter commented 9 years ago
Issue 63 has been merged into this issue.

Original comment by josecua...@gmail.com on 13 Oct 2009 at 11:46

GoogleCodeExporter commented 9 years ago
Additionally, we should attempt a mysql full text index on the ROR 
stateProvince, and see how this performs.  It 
won't scale well as we grow, but might be acceptable for the time being.  
Perhaps exposed only as a WS as the 
initial user requested.

Original comment by timrobertson100 on 13 Oct 2009 at 11:53

GoogleCodeExporter commented 9 years ago
Issue 63 has been merged into this issue.

Original comment by josecua...@gmail.com on 29 Jun 2010 at 3:19