Closed ArthurChapman closed 4 years ago
I have to admit that the thought of a field for parameters occurred to me in passing as well. I think it would help make things clear. The field could contain a good descriptive name for the parameter(s) and the default value(s). Default values for vocabularies may be tough in some cases, as they do not exist, or are not vetted community wide, or they are not apt for inclusion as they are (e.g., TGN). What do we do in those cases? I definitely do not think we are over-using parametrization. I think it is super important to make the tests flexible. Note that test suites will have to include parameter values as well.
OK. This is what I was trying to get at with my comment on #63 about the correlation of vocabs and parameterized.
OK, in checking the first few of Parameterized with the longsuffering @ArthurChapman, there are syntax and content issues we need to standardize before I feel comfortable about making more changes (42 all up at the moment). So far Parameter(s) edited in table as examples-
This raises
--
There has been some discussion around default values for parameterized tests
To answer @Tasilee should we use a link to a web address, a name ("The Getty Thesaurus of Geographic Names" or "TGN") or a link to an API? In the parameter field, I think it should be an API if possible for the default. The in the References, a full name and web link to the vocabulary.
Thanks @ArthurChapman. I agree that we should supply a default even if it is 'best guess' as that will be helpful for implementers as a starting position.
Regarding default minimum year , I think you mean '1753' and not '1953'? With my limited taxonomic experience, '1600' would seem a reasonable 'flag-raising' point but my reservation is that I tend to err toward false positives rather than false negatives. Meaning, I would rather raise a flag for those below 1753 than to not flag those between 1600 and 1753.
The 'XXX-YYY' was to cover terms in the 'Expected response' such as 'NOT_EMPTY', 'NOT_COMPLIANT', 'NO_REPORT' etc. I will check that these are in the vocab, as they have grown with the implementation of the 'Expected responses'.
I agree that the Parameter defaults should ideally point to an API, but a) some don't exist, b) some exist but may not be tightly coupled to a 'standard' and c) some are hard to find.
My note about References in Parameters means that in some cases, we use the references as a link to defaults. In other cases, I have taken info from the 'Expected response', for example if there is a mention of 'authority'.
Yes 1753. There is no logical reason for selecting 1753 for collections - there is for taxonomy. I am not sure where we got 1700 and what the logic was for that. 1600 predates the years of major scientific exploration (Spanish, Portuguese, British and French).
Tests should only be parameterized when we have identified user stories in the areas that TG3 examined that clearly have different parts of the community wishing to use different parameters. The two only valid cases that come to my mind right off are application of a particular national taxonomic authority for tests involving scientific names and specifications of the earliest valid date for identifications or eventDates, where particular data sets are known by their users to have earliest valid dates.
Parameters must not point to hypothetical resources that are not available to implementors.
@ArthurChapman, yes, if we specify that a test is parameterized, we must specify a default value.
I suspect that the identifiiers (guids) for tests should only apply to implementations of those tests that use the default parameter values, and that implemenations which take other values should use different guids to allow for machine comparison of results, but as the intent of parameters is to change the test behavior at runtime that might significantly complicate implementation. One alternative (thinking in terms of annotated java methods ala the filtered push implementations), would be to have one identifier refer to a test with the default parameter, and another identifier refer to the same test, but with any other value for the parameter (java implementation on the order of
@Provides("baf2a90b-af45-4f1a-839f-47126743a48a")
public DQResponse<AmendmentValue> amendmentYearStandardized(
@ActedUpon("dwc:year") String year)
{
Integer minimumYear = 1753;
return amendmentYearStandardized(year, minimumYear);
}
@Provides("ab37fd2a-fe95-4ab6-8a0c-e40ea3f97bb4")
public DQResponse<AmendmentValue> amendmentYearStandardized(
@ActedUpon("dwc:year") String year. Integer minimumYear)
{
// actual test implementation
}
), where the first method uses the guid currently specified for the test, and the second method uses a guid that we would need to specify for parameterized implementations.
@ArthurChapman and I have been discussing 'Needs work' tagged tests and resolved a few, but there are three remaining. Also, a question to the rest of you about the Expected Response regarding specified source authority. Should we
@Tasilee the updates to make the parameter values structured and consistent is great.
Significant remaining problem: A very large number of the tests which take parameters should not be parameterized. I've noted this on #20, only tests for which we have use cases where different user communities will expect the tests to behave in different ways should be parameterized (such as a country wishing to validate scientific names against a national list rather than a global one). We must not specify parameters that point implementors to a resource from which the controlled vocabulary for a particular test can be found, that is something for the notes. When the specification says, e.g. compliant if matching ISO vocabulary x, then the implementor must use that vocabulary, and where they get it an how they get it is an implementation detail, not a parameter.
All of the tests that have parameters need careful review to see if there is a clear use case for different users to expect different behaviors of the test for different uses, not whether or not there are multiple possible sources that could be used for some vocabulary.
We have 41 tests that specify parameters. It looks to me like only 18 of those are actually candidates for parameterization, and each of these needs careful consideration and identification of the use cases that require the test to be parameterized.
No. | Name | Parameter |
---|---|---|
84 | VALIDATION_YEAR_OUTOFRANGE | bdq:earliestDate = 1600, bdq:latestDate = current year |
107 | VALIDATION_MINDEPTH-MAXDEPTH_OUTOFRANGE | bdq:minimumValidDepthInMeters = 0, bdq:maximumValidDepthInMeters = 11000 |
112 | VALIDATION_MAXELEVATION_OUTOFRANGE | bdq:minimumValidElevationInMeters = -423, bdq:maximumValidEvelavtionInMeters = 8850 |
122 | VALIDATION_GENUS_NOTFOUND | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
123 | VALIDATION_CLASSIFICATION_AMBIGUOUS | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
22 | VALIDATION_PHYLUM_NOTFOUND | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
28 | VALIDATION_FAMILY_NOTFOUND | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
45 | AMENDMENT_POLYNOMIAL_STANDARDIZED | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
46 | VALIDATION_POLYNOMIAL_NOTSTANDARD | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
57 | AMENDMENT_TAXONID_FROM_TAXON | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
70 | VALIDATION_TAXON_AMBIGUOUS | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
71 | AMENDMENT_SCIENTIFICNAME_FROM_TAXONID | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
77 | VALIDATION_CLASS_NOTFOUND | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
81 | VALIDATION_KINGDOM_NOTFOUND | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
83 | VALIDATION_ORDER_NOTFOUND | bdq:sourceAuthority (default = https://www.gbif.org/en/developer/species) |
76 | VALIDATION_DATEIDENTIFIED_OUTOFRANGE | Default values: bdq:earliestDate = 1753-01-01, bdq:latestDate = current day |
36 | VALIDATION_EVENTDATE_OUTOFRANGE | Default values: bdq:earliestValidDate = 1600, bdq:latestValidDate = current year |
39 | VALIDATION_MINELEVATION_OUTOFRANGE | Default values: bdq:minimumValidElevationInMeters = -428, bdq:maximumValidElevationInMeters = 8850 |
102 | AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT | (but not: bdq:sourceAuthority (default = http://epsg.io/)) |
The following tests have parameters and look to me like they very unambiguously must not be parameterized. The resources mentioned should be moved either into the specification or the notes, and not specified as a parameter.
No. | Name | Parameter |
---|---|---|
106 | AMENDMENT_IDENTIFICATIONQUALIFIER_FROM_TAXON | bdq:sourceAuthority (default = (https://dwc.tdwg.org/terms/#identificationQualifier) |
59 | VALIDATION_GEODETICDATUM_NOTSTANDARD | bdq:sourceAuthority (default = http://epsg.io/) |
60 | AMENDMENT_GEODETICDATUM_STANDARDIZED | bdq:sourceAuthority (default = http://epsg.io/) |
51 | VALIDATION_COORDINATES_TERRESTRIALMARINE | bdq:sourceAuthority (default = http://irmng.org) |
162 | VALIDATION_TAXONRANK_NOTSTANDARD | bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml) |
163 | AMENDMENT_TAXONRANK_STANDARDIZED | bdq:sourceAuthority (default = http://rs.gbif.org/vocabulary/gbif/rank.xml) |
104 | VALIDATION_BASISOFRECORD_NOTSTANDARD | bdq:sourceAuthority (default = http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord) |
63 | AMENDMENT_BASISOFRECORD_STANDARDIZED | bdq:sourceAuthority (default = http://rs.tdwg.org/dwc/terms/index.htm#basisOfRecord) |
133 | AMENDMENT_LICENSE_STANDARDIZED | bdq:sourceAuthority (default = https://creativecommons.org/) |
38 | VALIDATION_LICENSE_NOTSTANDARD | bdq:sourceAuthority (default = https://creativecommons.org/) |
97 | VALIDATION_IDENTIFICATIONQUALIFIER_DETECTED | bdq:sourceAuthority (default = https://dwc.tdwg.org/terms/#identificationQualifier) |
115 | AMENDMENT_OCCURRENCESTATUS_STANDARDIZED | bdq:sourceAuthority (default = https://dwc.tdwg.org/terms/#occurrenceStatus) |
116 | VALIDATION_OCCURRENCESTATUS_NOTSTANDARD | bdq:sourceAuthority (default = https://dwc.tdwg.org/terms/#occurrenceStatus) |
20 | VALIDATION_COUNTRYCODE_NOTSTANDARD | bdq:sourceAuthority (default = https://restcountries.eu/#api-endpoints-list-of-codes) |
48 | AMENDMENT_COUNTRYCODE_STANDARDIZED | bdq:sourceAuthority (default = https://restcountries.eu/#api-endpoints-list-of-codes) |
62 | VALIDATION_COUNTRY_COUNTRYCODE_INCONSISTENT | bdq:sourceAuthority (default = https://restcountries.eu/#api-endpoints-list-of-codes) |
73 | AMENDMENT_COUNTRYCODE_FROM_COORDINATES | bdq:sourceAuthority (default = https://restcountries.eu/#api-endpoints-list-of-codes) |
50 | VALIDATION_COORDINATES_COUNTRYCODE_INCONSISTENT | bdq:sourceAuthority (default = https://www.iso.org/obp/ui) |
118 | AMENDMENT_GEOGRAPHY_STANDARDIZED | bdq:sourceAuthority (default = The Getty Thesaurus of Geographic Names (TGN: http://www.getty.edu/research/tools/vocabularies/tgn/index.html)) |
139 | VALIDATION_GEOGRAPHY_NOTSTANDARD | bdq:sourceAuthority (default = The Getty Thesaurus of Geographic Names (TGN: http://www.getty.edu/research/tools/vocabularies/tgn/index.html)) |
21 | VALIDATION_COUNTRY_NOTSTANDARD | bdq:sourceAuthority (default = The Getty Thesaurus of Geographic Names (TGN: http://www.getty.edu/research/tools/vocabularies/tgn/index.html)) |
95 | VALIDATION_GEOGRAPHY_AMBIGUOUS | bdq:sourceAuthority (default = The Getty Thesaurus of Geographic Names (TGN: http://www.getty.edu/research/tools/vocabularies/tgn/index.html)) |
@chicoreus I will look at this in detail when I get back home (away at the moment), but the Geodetic Datum (#102, #59, #60) ones should be Paramaterized as different jurisdictions use different defaults (some by legislation - eg. Brazil) and WGS84 may not always be the best default. In Brazil, for example, if no datum is specified, you can be nearly certain that the default is either SAD69(96) or SIRGAS2000 (depending on the date). Also many jurisdictions are using Coordinate Reference Systems (CRS) rather then datums as these are more often than not what is being given on GPS units. I will check their wording later. Like you, I think we have unnecessarily made too many tests Paramaterized. @tucotuco may have good reasons for some of these, but I think we need to justify each test. Perhaps there are comments with justifications under the individual tests - I will check later.
@ArthurChapman looks like #102 should be parameterized, while #59 and #60 should not. Added notes in those issues.
I've updated the tables in the comments above accordingly, moving #102 into should be parameterized.
Having looked at your list @chicoreus for tests that "shouldn't" be Paramaterized - I have the following comments.
No. | Name | Parameter |
---|---|---|
106 | AMENDMENT_IDENTIFICATIONQUALIFIER_FROM_TAXON | I think this was so people could add characters that they could look for "?", "cf." "aff." or could add others. I'd be happy either way with this one. |
102 | AMENDMENT_GEODETICDATUM_ASSUMEDDEFAULT | as I noted in previous comment - should be Paramaterized |
59 | VALIDATION_GEODETICDATUM_NOTSTANDARD | Should not be Paramaterized |
60 | AMENDMENT_GEODETICDATUM_STANDARDIZED | Should not be Paramaterized |
51 | VALIDATION_COORDINATES_TERRESTRIALMARINE | This one was parameterized because of two ways of checking for isMarine 1) using GIS/Google Maps to determine if on land or not 2) using a list of marine species and checking if in that list or not. We could decide to use only one method and then remove from Paramaterized |
162 | VALIDATION_TAXONRANK_NOTSTANDARD | I would be happy for us to decide to go with the GBIF Rank Vocabulary (there is no real alternative) and remove Paramaterization |
163 | AMENDMENT_TAXONRANK_STANDARDIZED | I would be happy for us to decide to go with the GBIF Rank Vocabulary (there is no real alternative) and remove Paramaterization |
104 | VALIDATION_BASISOFRECORD_NOTSTANDARD | I would be happy for us to decide to go with the DwC recommended (it can always be formal;ised later) and remove Paramaterization |
63 | AMENDMENT_BASISOFRECORD_STANDARDIZED | I would be happy for us to decide to go with the DwC recommended (it can always be formalised later) and remove Paramaterization |
133 | AMENDMENT_LICENSE_STANDARDIZED | Problem I see here is that we are following dcterms:license - which could be broader than just Creative Commons. Do we wish to restrict to Creative Commons, or allow other license conditions to be valid? and thus allow someone to chose different vocabulary? |
38 | VALIDATION_LICENSE_NOTSTANDARD | Problem I see here is that we are following dcterms:license - which could be broader than just Creative Commons. Do we wish to restrict to Creative Commons, or allow other license conditions to be valid? and thus allow someone to chose different vocabulary? |
97 | VALIDATION_IDENTIFICATIONQUALIFIER_DETECTED | I think this was so people could add characters that they could look for "?", "cf." "aff." or could add others. I'd be happy either way with this one. |
115 | AMENDMENT_OCCURRENCESTATUS_STANDARDIZED | Currently, DwC only recommends "present" "absent". I understand some would like this broadened. But as it stands with only two options, I don't see why it should be Paramaterized unless a community (invasives?) want to use a different vocabulary. @tucotuco paramaterized this - what was the thinking? A paper currently in press is recommending modification to include a third term "doubtful" - but if this is accepted (or not) - I only see the one vocabulary that we would be using - and hopefully it will be eventually formalised beyond a mere DwC recommendation. I thus don't see a strong justification for Paramaterization |
116 | VALIDATION_OCCURRENCESTATUS_NOTSTANDARD | See comment above. |
20 | VALIDATION_COUNTRYCODE_NOTSTANDARD | As noted in a comment under #20, I see no reason for Paramatarization |
48 | AMENDMENT_COUNTRYCODE_STANDARDIZED | As noted in a comment under #20, I see no reason for Paramaterization |
62 | VALIDATION_COUNTRY_COUNTRYCODE_INCONSISTENT | As noted in a comment under #20 we refer in the description to an ISO code, so I see no reason for Paramaterization |
73 | AMENDMENT_COUNTRYCODE_FROM_COORDINATES | This might be a more difficult one as the ISO Standard doesn't have geographic boundaries. So there may need to be some variation on what one chooses as the method for determining boundaries. We still have decide on this.... |
50 | VALIDATION_COORDINATES_COUNTRYCODE_INCONSISTENT | Similar to #73 |
118 | AMENDMENT_GEOGRAPHY_STANDARDIZED | The geography ones, I am not sure about - we need further discussion on these and what we should use. TGN may be OK for some - Google Maps for others???? There is a discussion somewhere under an issue that I can't find at the moment. |
139 | VALIDATION_GEOGRAPHY_NOTSTANDARD | See comment above under #118 |
21 | VALIDATION_COUNTRY_NOTSTANDARD | See comment above under #118 |
95 | VALIDATION_GEOGRAPHY_AMBIGUOUS | See comment above under #118 |
Agreed @chicoreus re #102, #59 and #60. #102 Paramaterized, #59 and #60 not - with bdq:sourceAuthoriity=http://epsg.io/
Copied from #102 as comment applicable to more than just that test With all tests (especially NOTSTANDARD and STANDARDIZED tests) that use an external Standard - ISO, DCMI, EPSG, or any Vocabulary, the vocabulary, standard, etc. is the bdq:sourceAuthority and you are checking to see if the value in the record is a valid record in the bdq:sourceAuthority (in the case of Validations) or can be amended to conform with a value in the bdq:sourceAuthority (in the case of Amendments). In nearly all cases, there is only one sourceAuthority (except as @chicoreus mentions with Taxon names), so there is no choice of sourceAuthority needed, only the choice of a value from that sourceAuthority. Those few cases where there is a choice of sourceAuthority (taxon names) you require both 1) a choice of bdq:sourceAuthority, and 2) a choice of value within that source authority. Thus, I agree with @chicoreus that we don't need as many Paramaterized tests as we have previously so tagged. Unless @tucotuco has justifications for them that we have not thought of.
Thanks @chicoreus and @ArthurChapman. Reading through the table and your comments Arthur, here is my take on it. Maybe after a Pinot Noir or two, I would think differently.
@tucotuco : We would value your discerning eye (or two) on this lot. I'll hold off edits for a response. I hope all is ok over there.
I Think you missed a few @Tasilee Paramaterized
Not Paramaterized
@tuco might particularly like to comment on (see my table and comments above) #51, #115, #116, #73, #50, #118, #139, #21, #95
@ArthurChapman: I was using the table only..so will add missing into here. And BTW, you also missed #39 (Parameterised), #79 isn't parameterised:
I am presuming for the Not parameterised above, we move any reference to a default source authority to the References section? That is, the Parameter field is EMPTY.
@Tasilee I guess that would make sense, however it doesn't distinguish the default or target source Authority from any other reference. Perhaps we should put them in the Reference but as "bdq:sourceAuthority=xxxxxxx" and then the other references
@ArthurChapman - that seems like a good strategy. I'll tackle the updates on Monday to give @tucotuco and @pzermoglio a chance to comment.
Sorry folks, though I think there are a couple of good catches in this discussion, I am afraid that some of it will take us into circular reasoning. I think most of the tests that were tagged to be parametrized were correctly so. A big part of my stance on this is hidden in a comment to issue #63 (https://github.com/tdwg/bdq/issues/63#issuecomment-491877591). Basically, Darwin Core is not a source authority for values. But that is only part of the issue. The other is that we can't make standardizations without a thesaurus (or at least a simple lookup table) - controlled vocabularies are not enough. This is the reason we brought TG4 into existence, recognizing this fundamental need to develop the tests in tandem with the vocabularies that allow them to actually function.
Some specific comments...
I would like to challenge this statement by @chicoreus: "Tests should only be parameterized when we have identified user stories in the areas that TG3 examined that clearly have different parts of the community wishing to use different parameters."
Why? Can't it be evident aside from the work in TG3? Are the results of TG3 exhaustive for all time?
I would also like to propose an amendment to the statement by @chicoreus:
"Parameters must not point to hypothetical resources that are not available to implementors."
Instead of "Parameters", this should be "Default sources".
@Tasilee asked "Should we
I vote for bdq:sourceAuthority. For example, change "using a specified source authority service" to "using the bdq:sourceAuthority".
I would like to challenge this statement by @chicoreus:
"We must not specify parameters that point implementors to a resource from which the controlled vocabulary for a particular test can be found, that is something for the notes. When the specification says, e.g. compliant if matching ISO vocabulary x, then the implementor must use that vocabulary, and where they get it an how they get it is an implementation detail, not a parameter."
I agree for VALIDATION tests where the vocabulary is written in stone. This is not true of most Darwin Core terms, which make recommendations, not requirements. The philosophy has always been to decouple requirements from definitions wherever possible. All of the AMENDMENT_ tests need a parameter to point to a source for the lookups. If we only used controlled vocabularies, we couldn't do any standardization, because only the standard values would be found, not the values from which the standard values would be determined. I do agree that there is a subset of tests that we currently have as parametrized that need not be. To me, these are only #20 (TG2-VALIDATION_COUNTRYCODE_NOTSTANDARD), #21 (TG2-VALIDATION_COUNTRY_NOTSTANDARD), #59 (TG2-VALIDATION_GEODETICDATUM_NOTSTANDARD), #79 (TG2-VALIDATION_DECIMALLATITUDE_OUTOFRANGE), #162 (TG2-VALIDATION_TAXONRANK_NOTSTANDARD). #21 and 59 will need to be explicit about the expectations. For example, for #21, it must be explicit whether the preferred name is the standard name, or if any of the names in any of the names or codes are acceptable standard names. For #59, it will need to be made explicit whether the epsg code is the only standard (because its the only thing that is unambiguous), or if any of the names in Geodetic CRS, Datum, or Ellipsoid are also acceptable.
Again, sorry, especially that it took this long to respond, but it was unavoidable.
One issue that @tucotuco's comments bring up is the urgent need for Vocabularies of Values to be created for all the current Darwin Core terms that are currently refrerred to in the tests. Perhaps TG4 (at Leiden?) needs to establish a working group under the TG with the remit to create as many Vocabularies of Values for those terms that are possible in the short term (especially beginning with the easy ones). Some, I think, only have a limited number of terms, but we will need to formalise them under the format that TG4 is proposing to develop. I guess a first step is to make a list, with an assessment of what is required, and a work program. @pzermoglio something for the agenda in Leiden - perhaps discuss informally on the Sunday.
Thanks @tucotuco. Good to have your insights again, but I am struggling. I will repeat a comment I made somewhere among the tests. We have two scenarios for Parameterised
Your comment "we can't make standardizations without a thesaurus (or at least a simple lookup table) - controlled vocabularies are not enough" focuses on the second scenario. But surely we can't anticipate every possible misspelling or incorrectly interpreted 'value' to lookup? I guess I am assuming in at least some of the AMENDMENTS, that we are using pattern matching in the test code to have a stab at interpreting a potential target. Take the example in #133
dc:license="CCZero" becomes dc:license="https://creativecommons.org/publicdomain/zero/1.0/", following the Creative Commons vocabulary.
@tucotuco: You are implying that we have a thesaurus that contains "CCZero"?
As usual, I am probably missing something.
Also, I have to bow to your Darwin Core philosophy: "Darwin Core is not a source authority for values". Our tests are Darwin Core based (and hence scenario 1 above is not applicable), but scenario 2 is. We are indeed stuffed in terms of vocabs (let alone thesauri), hence TG4, but we need to grab onto any straw we currently have, and DwC 'values' are a 'port in a storm'?
@Tasilee I think we do need vocabularies/thesauri. License is a difficult one - but CCZero could = CC0 (1.0) or CC0 (1.0) Universal, etc. and then link to https://creativecommons.org/publicdomain/zero/1.0/. Also with many of the earlier Creative Commons there were many Ports (versions in different languages - see for examplke, https://creativecommons.org/tag/porting/). Version 4.0 is suppoosed to be a Universal set without the need for Porting, and that is encouraged for all new uses. A thesuarus would hopefully list these and (maybe) sononymise many.
@tucotuco has extracted the licensing records from GBIF. Many (majority) are in the form of "ex coll.
I am saying explicitly, not implying, that we have a thesauri for vocabularies of terms that need to be cleaned. So yes, a license lookup that says 'CCzero' is a synonym of the unequivocally preferred term ' https://creativecommons.org/publicdomain/zero/1.0/'.
My point is that values alone don't help us do any lookups - whether taken from the examples given in Darwin Core (examples are no longer even canonical) or elsewhere. Pattern matching is an implementation solution, not a community data-driven one, which means we would rely on tech people to make the mappings, not on the people who know (and are even responsible for) the state of the domain.
I do not see two scenarios. Both examples need a source authority and we decided that all tests that take a parameter should have a default value for that parameter. To me it is best to be able to specify the source authority when there isn't a single definitive option. This is in order to decouple the test and the data used for the test, so that tests are less likely to be implementation dependent. Imagine certifying an implementation
We can't effectively anticipate every possible nonsense that might come along. I agree. We don't need to. But we can certainly create a lookup of every bit of nonsense that has been seen so far, and we can strive for an infrastructure that accumulates new nonsense as it arises and lets us provide the lookups for those as we move forward.
I hope that helps explain where I am coming from.
On Sun, Sep 8, 2019 at 6:46 PM Lee Belbin notifications@github.com wrote:
Thanks @tucotuco https://github.com/tucotuco. Good to have your insights again, but I am struggling. I will repeat a comment I made somewhere among the tests. We have two scenarios for Parameterised
- Genuine options for bdq:sourceAuthority (e.g., #28 https://github.com/tdwg/bdq/issues/28) and
- Options for a default value (e.g., #133 https://github.com/tdwg/bdq/issues/133 )
Your comment "we can't make standardizations without a thesaurus (or at least a simple lookup table) - controlled vocabularies are not enough" focuses on the second scenario. But surely we can't anticipate every possible misspelling or incorrectly interpreted 'value' to lookup? I guess I am assuming in at least some of the AMENDMENTS, that we are using pattern matching in the test code to have a stab at interpreting a potential target. Take the example in #133 https://github.com/tdwg/bdq/issues/133
dc:license="CCZero" becomes dc:license=" https://creativecommons.org/publicdomain/zero/1.0/", following the Creative Commons vocabulary.
@tucotuco https://github.com/tucotuco: You are implying that we have a thesaurus that contains "CCZero"?
As usual, I am probably missing something.
Also, I have to bow to your Darwin Core philosophy: "Darwin Core is not a source authority for values". Our tests are Darwin Core based (and hence scenario 1 above is not applicable), but scenario 2 is. We are indeed stuffed in terms of vocabs (let alone thesauri), hence TG4, but we need to grab onto any straw we currently have, and DwC 'values' are a 'port in a storm'?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/178?email_source=notifications&email_token=AADQ7257LNTO6SRIFE4KMY3QIVXD7A5CNFSM4HMTTKZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6F2EKQ#issuecomment-529244714, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ725ZIJP4GAHFDIDRBGDQIVXD7ANCNFSM4HMTTKZA .
@tucotuco - "Pattern matching is an implementation solution". I agree. I was unaware of the extent on thesauri to our issues - which is a more 'standard' solution that is openly accessible and hopefully understandable.
This reminds me of the eureka moment aeons ago in TDWG (TIP days) when I realized that we needed an effective environment for the creation and management of ontologies. We needed an environment created by 'programmers' that made it easy to add terms, definitions and relationships. As far as I am aware, such a user (application domain specialist)-centric environment still doesn't exist (but I could be wrong as I have not recently researched it).
I think such an environment for biodiversity informatics-related thesauri (term -> preferred standard term, definition, comments and links etc) would be nice. A wiki style of management? A list by itself is a start, but when isolated and without provenance, is less than optimal. Governance is a key issue. If there is an 'authority', grand, but the system still needs to be open to public comment for efficient improvements.
I totally agree. I think ontology management has progressed well and has viable environments and tools. Some of our vocabs would be best accommodated by ontologies, especially basisOfRecord. For the rest, I think it is high time we dive in and play with what Tim has to offer,
On Mon, Sep 9, 2019 at 7:39 PM Lee Belbin notifications@github.com wrote:
@tucotuco https://github.com/tucotuco - "Pattern matching is an implementation solution". I agree. I was unaware of the extent on thesauri to our issues - which is a more 'standard' solution that is openly accessible and hopefully understandable.
This reminds me of the eureka moment aeons ago in TDWG (TIP days) when I realized that we needed an effective environment for the creation and management of ontologies. We needed an environment created by 'programmers' that made it easy to add terms, definitions and relationships. As far as I am aware, such a user (application domain specialist)-centric environment still doesn't exist (but I could be wrong as I have not recently researched it).
I think such an environment for biodiversity informatics-related thesauri (term -> preferred standard term, definition, comments and links etc) would be nice. A wiki style of management? A list by itself is a start, but when isolated and without provenance, is less than optimal. Governance is a key issue. If there is an 'authority', grand, but the system still needs to be open to public comment for efficient improvements.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tdwg/bdq/issues/178?email_source=notifications&email_token=AADQ72YWXCNJ4GMDV323GR3QI3GBLA5CNFSM4HMTTKZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6JH45A#issuecomment-529694324, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQ7264R6DAATG77CCHFT3QI3GBLANCNFSM4HMTTKZA .
We have a quorum to CLOSE.
Having a look at the tests, we now seem to have added Parameterized to (virtually) every test where we have a vocabulary - even where (e.g. #62) the Vocabulary is an ISO Standard.
I am not sure that we have thought this through for each case.