srvarey / gbif-occurrencestore

Automatically exported from code.google.com/p/gbif-occurrencestore
0 stars 0 forks source link

Add common_name #20

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Common name joins to the nub and comes from CLB

Original issue reported on code.google.com by timrobertson100 on 22 Apr 2011 at 8:27

GoogleCodeExporter commented 9 years ago
For reference: common_name is a table in the portal that links taxon concepts 
with names in multiple languages. One taxon_concept can have multiple 
common_names.

iso_language_code is NOT NULL but it has about 50% empty strings. Sometimes 
only language is filled in (e.g. "Spanish") but no ISO code.

Original comment by lars.fra...@gmail.com on 6 May 2011 at 12:59

GoogleCodeExporter commented 9 years ago
SELECT COUNT(*) FROM name_usage nu JOIN vernacular_name vn ON 
(nu.id=vn.usage_fk) WHERE nu.checklist_fk=1;

This should select all vernacular names (which I presume is the source of the 
common names) that are associated to checklist 1 (which is the nub) but it 
returns 0 results.

Original comment by lars.fra...@gmail.com on 11 May 2011 at 2:09

GoogleCodeExporter commented 9 years ago
Ok, so you then need to join through name_usage again to the source checklist 
usages.

So... something along the lines of:

SELECT
  nub.id AS taxon_concept_id,
  vns.vernacular_name AS common_name
FROM
  name_usage nub
    JOIN name_usage source ON source.nub_fk=nub.id
    JOIN vernacular_name vn ON vn.usage_fk=source.id
    JOIN vernacular_name_string vns ON vns.id=name_string_fk
 WHERE
   vns.vernacular_name like '%Basil%'
 GROUP BY nub.id, vns.vernacular_name;

One thing of great concern is that there are only 372k vernacular records - not 
an issue of the processing though, but rather low.

Original comment by timrobertson100 on 11 May 2011 at 2:29

GoogleCodeExporter commented 9 years ago
Turns out there is already a export_common_name view in CLB. Going to use that.

Original comment by lars.fra...@gmail.com on 12 May 2011 at 10:27

GoogleCodeExporter commented 9 years ago
It does look a little suspicious though:

CREATE OR REPLACE VIEW export_common_name AS 
 SELECT DISTINCT ON (nub.id, upper(vns.vernacular_name::text)) vn.id, nub.id AS taxon_concept_id, vns.vernacular_name AS name, vn.language AS iso_language_code, vn.language
   FROM name_usage nub
   JOIN name_usage nu ON nu.nub_fk = nub.id
   JOIN vernacular_name vn ON nu.id = vn.usage_fk
   JOIN vernacular_name_string vns ON vns.id = vn.name_string_fk
  WHERE nub.checklist_fk = 1 AND vns.blacklisted = false
  GROUP BY vn.id, nub.id, vns.vernacular_name, vn.language, vn.language;

There are 2 vn.language in the group by...

Original comment by timrobertson100 on 12 May 2011 at 11:17

GoogleCodeExporter commented 9 years ago
Incorporated into workflow. Pending testing.

Original comment by lars.fra...@gmail.com on 12 May 2011 at 1:12

GoogleCodeExporter commented 9 years ago
Added in r586.

Note: We needed to drop the NOT NULL constraint but this is properly handled by 
the Portal.

Original comment by lars.fra...@gmail.com on 17 May 2011 at 8:11