lter / LTER-core-metabase

RDB model, based on the GCE LTER Metabase, with adaptations from other LTER sites
https://lter.github.io/LTER-core-metabase/
GNU General Public License v3.0
11 stars 4 forks source link

Patch 41 causes problems in an EML view (mb2eml_r.vw_eml_attributecodedefinition) #97

Closed gremau closed 3 years ago

gremau commented 3 years ago

Patch 41 causes an issue in which the code and definition columns in mb2eml_r.vw_eml_attributecodedefinition were not getting populated with values from DataSetAttributeEnumeration. @gastil and I found that two things helped:

  1. Doing this:

    update lter_metabase."DataSetAttributeEnumeration"
    set "CodeID" = "Code"
    where "CodeID" is null
  2. Changing the view query in mb2eml_r.vw_eml_attributecodedefinition to

CREATE OR REPLACE VIEW mb2eml_r.vw_eml_attributecodedefinition
AS SELECT d."DataSetID" AS datasetid,
    d."EntitySortOrder" AS entity_position,
    d."ColumnName" AS "attributeName",
    d."Code" AS code,
    d."Definition" AS definition
   FROM lter_metabase."DataSetAttributeEnumeration" d
     LEFT JOIN lter_metabase."ListCodes" l ON d."CodeID"::text = l."CodeID"::text
  ORDER BY d."DataSetID", d."EntitySortOrder";

I'm not sure if 1 should be something that is done in the patch itself? I think the changes in the patch mean that all codes for nominalEnum attributes need to be listed in the ListCodes table, but that doesn't seem to be the case in the LTER-core-metabase as it is currently specifed.

Probably this is for @atn38 but I will see if I can get a pull request together

atn38 commented 3 years ago

Hey Greg, thanks for this! Your stab at a pull request would be great. I'm no SQL whiz ;)

As to your question, yes, I'm pretty sure all Enum types should be in ListCodes.

On Thu, Mar 4, 2021, 07:55 Greg Maurer notifications@github.com wrote:

Patch 41 https://github.com/lter/LTER-core-metabase/blob/migration/sql/41_consolidate_missing_enumeration_codes.sql causes an issue in which the code and definition columns in mb2eml_r.vw_eml_attributecodedefinition were not getting populated with values from DataSetAttributeEnumeration. Two things helped:

  1. Doing this:

update lter_metabase."DataSetAttributeEnumeration" set "CodeID" = "Code" where "CodeID" is null

  1. Changing the view query in mb2eml_r.vw_eml_attributecodedefinition to

CREATE OR REPLACE VIEW mb2eml_r.vw_eml_attributecodedefinition AS SELECT d."DataSetID" AS datasetid, d."EntitySortOrder" AS entity_position, d."ColumnName" AS "attributeName", d."Code" AS code, d."Definition" AS definition FROM lter_metabase."DataSetAttributeEnumeration" d LEFT JOIN lter_metabase."ListCodes" l ON d."CodeID"::text = l."CodeID"::text ORDER BY d."DataSetID", d."EntitySortOrder";

I'm not sure if 1 should be something that is done in the patch itself? Should all codes for nominalEnums therefore be need to be listed in ListCodes table now?

Probably this is for @atn38 https://github.com/atn38 but I will see if I can get a pull request together

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lter/LTER-core-metabase/issues/97, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKAZD5TIOWQPSTBXQ5DXISTTB6GNVANCNFSM4YTJPGRA .

gastil commented 3 years ago

I propose we add this diagnostic below for sites with pre-existing installations for them to find any re-used Codes with different Definitions. In the example data that comes with metabase, there is one such example. So you can test this diagnostic on a fresh installation. Or, prior to running patch41, you can edit the SQL to be specific to your own such duplicates. Or you can manually adjust the duplicates, as there are likely to be few.

select "Code", max("Definition"), min("Definition"), count(*) from
(
select "Code", "Definition", count(*)
from lter_metabase."DataSetAttributeEnumeration"
group by "Code", "Definition"
    ) as multiused
group by "Code"
having count(*) > 1
order by "Code"
;

See the Arroyo Quemado has a period in one and not the other. Screen Shot 2021-03-05 at 11 12 06 AM That is a nested query of the same table twice. Good suggestion @atn38

atn38 commented 3 years ago

Your code works Gastil, although I was sorta thinking of this:

select distinct a."Code" from
lter_metabase."DataSetAttributeEnumeration" a, 
lter_metabase."DataSetAttributeEnumeration" b 
where a."Code" = b."Code" AND a."Definition" != b."Definition"
;

No matter. All you need is to come up with a list of re-used codes with different definitions.

gremau commented 3 years ago

So... Patch 41 has now been updated with commit 357eb41146a. On my installation of LTER-core-metabase that had the earlier, failed version of the patch, I have now rerun Patch 41 with the new commit. It seems to work well - ListCodes gets populated with CodeID values, columns in DataSetAttributeEnumeration are dropped, primary/foreign keys are created, and the view (vw_eml_attributecodedefinition) gets rebuilt. For the one duplicate code in the example data that has 2 definitions (AQUE), 2 new CodeIDs are created in the ListCodes table - enum.AQUE14 and enum.AQUE15 - one for the shorter, and one for the longer definition. This seems to be the intended behavior and as long as users know how to interpret similar duplicate codes when they run the patch on an existing metabase installation it should work fine.

I move that we close the issue.

gastil commented 3 years ago

Thank you for that summary Greg. I agree we close the issue, unless An has last minute additions. An?

On Tue, Mar 30, 2021 at 5:54 AM Greg Maurer @.***> wrote:

So... Patch 41 has now been updated with commit 357eb41146a https://github.com/lter/LTER-core-metabase/commit/357eb41146af38b3e35c089dd15955149820674a. On my installation of LTER-core-metabase that had the earlier, failed version of the patch, I have now rerun Patch 41 with the new commit. It seems to work well - ListCodes gets populated with CodeID values, columns in DataSetAttributeEnumeration are dropped, primary/foreign keys are created, and the view (vw_eml_attributecodedefinition) gets rebuilt. For the one duplicate code in the example data that has 2 definitions (AQUE), 2 new CodeIDs are created in the ListCodes table - enum.AQUE14 and enum.AQUE15 - one for the shorter, and one for the longer definition. This seems to be the intended behavior and as long as users know how to interpret similar duplicate codes when they run the patch on an existing metabase installation it should work fine.

I move that we close the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lter/LTER-core-metabase/issues/97#issuecomment-810206748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGTM6FQ3VHU553LMCGF7ESDTGHCYHANCNFSM4YTJPGRA .

atn38 commented 3 years ago

Yea, I don't think I have anything to add!

On Tue, Mar 30, 2021, 14:06 Gastil @.***> wrote:

Thank you for that summary Greg. I agree we close the issue, unless An has last minute additions. An?

On Tue, Mar 30, 2021 at 5:54 AM Greg Maurer @.***> wrote:

So... Patch 41 has now been updated with commit 357eb41146a < https://github.com/lter/LTER-core-metabase/commit/357eb41146af38b3e35c089dd15955149820674a . On my installation of LTER-core-metabase that had the earlier, failed version of the patch, I have now rerun Patch 41 with the new commit. It seems to work well - ListCodes gets populated with CodeID values, columns in DataSetAttributeEnumeration are dropped, primary/foreign keys are created, and the view (vw_eml_attributecodedefinition) gets rebuilt. For the one duplicate code in the example data that has 2 definitions (AQUE), 2 new CodeIDs are created in the ListCodes table - enum.AQUE14 and enum.AQUE15 - one for the shorter, and one for the longer definition. This seems to be the intended behavior and as long as users know how to interpret similar duplicate codes when they run the patch on an existing metabase installation it should work fine.

I move that we close the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/lter/LTER-core-metabase/issues/97#issuecomment-810206748 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AGTM6FQ3VHU553LMCGF7ESDTGHCYHANCNFSM4YTJPGRA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lter/LTER-core-metabase/issues/97#issuecomment-810505834, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKAZD5TQQ3JSAOKW3EOLOJTTGIONBANCNFSM4YTJPGRA .