Open smroecker opened 1 year ago
I think item 1 is not that misleading as the areasymbol, areatypename etc. are all together and are corresponding to the "State or Territory" in the query result. I can definitely see needing to disambiguate if multiple areatypename were used in the same query.
Granted that information is not particularly useful for soil series overlap... the fact that those new columns were included was probably an oversight from https://github.com/ncss-tech/soilDB/pull/188 which added several columns that were not present in the result for consistency.
I don't think there was a specific need for the areaacres
or obterm
in this context. I suppose we can safely remove these columns as there are plenty of places to get the information on state area and the obterm
is not relevant in this context. EDIT: removed in https://github.com/ncss-tech/soilDB/commit/b5f5692ea97af8f08826157a886ab7c52f22c374
I suggest the resulting column name for area overlaps be named according the original areaiidref name followed by the associated area table column name (e.g. typelocstareaiidref.areasymbol or typelocstareaiidref.areaname)
This seems like a fine suggestion in terms of identifying related tables and the context associated with an area symbol or name. I don't think the "iidref" suffix is necessary considering this is a made up table/column physical name combination.
More generally: changing column names would be a fairly large breaking change for basically all existing uses of the areasymbol
column if I understand correctly. As suggested would result in some pretty convoluted names (that arguably are more confusing than just understanding how the area table works, or just documenting in a function where the data come from), so implementing this would require some careful testing and assessment of impacts.
I don't really think this is a "big problem" in terms of people understanding NASIS (I have never had anyone contact me about it), but I would gladly review pull request(s) from you to implement these changes when you have time. Do you have an idea of how many/which functions you would expect to be impacted by this change?
I think the original column name/iidref linkage (e.g. typelocstarea.areasymbol) better convenes the column purpose where it's used in various tables (like the project table). I agreed the iidref part isn't necessary. Also, in many instances the areaiidref column name isn't unique, even when it is elsewhere when referring to the same areatype. If we adopt this convention it'll make including the areatypename unnecessary. To be clear, at this time, I'm not advocating a wholesale change of all the functions that use areasymbol, but would recommend it moving forward. The one logical exception I can see is where the areasymbol is linked to the legend and mapunit (which would minimzie the majority of disruptions). Since we seem to be in agreement, perhaps we could start here.
Agreed if we carve out an exception for legend/mapunit that probably would cover most use cases that would "break".
I am open to a PR to implement this where you want, and we can start with get_soilseries_from_NASIS(), I think anything that is less ambiguous is a "good" change even if we have to fix things. A PR is the right place to test these changes while we work out the kinks.
I can't help but think there are probably other instances beyond areaiidrefs where this idea could be implemented to make it less ambiguous where data come from when a query behind the scenes does something that involves joins or related tables with potentially ambiguous column names.
I think you are advocating for a wholesale change--though I recognize you would like to focus on this function for now. In the big picture I would like to see a roadmap of sorts for where we plan to implement this new recommendation. A checklist of places to implement and sketch of the rules could be a good topic for either a new issue and/or for a section in the "soilDB developers guide" (https://github.com/ncss-tech/soilDB/issues/269; which I would appreciate your help in drafting)
I think there might be some other things to consider when you have child tables referencing area that have many:1 relationship with the parent... examples relevant to get_soilseries_from_NASIS()
would be if we wanted to include information on Soil Series MLRAs Using or Soil Series States Using in addition to the "type location state". The former would need to be concatenated, I believe
Sounds good. I'd be interested to discuss the developer guide more.
Coming back to this topic, specifically item 2 from OP. I have received a request to add the state, area and county information to the Site table results returned by get_site_data_from_NASIS_db()
. This reminded me of this issue, which was only half resolved.
Unfortunately I have not progressed on the "developers guide" in the last 1.5 years or so, but still would like to set aside some time to rough it out it some day. The following could be snippets to include related to dealing with the frequent relationships to the Area table that occur throughout NASIS.
Site table is a bit of a unique case (and related to item 2) because 3 different areaiidrefs are present with custom "relationship name" for each. The relationship name would be a good derived column name replacement for "areasymbol" in most cases (excluding legend
table).
I suggest this option rather than making something up like mlraarea.areasymbol
. The main issue I have with the latter is mlraarea
is not a table, but the name suggests that it is.
So, for the site table we could have: site_state
, site_county
and site_mlra
column names derived from the relationship name for site.stateareaiidref
:area.areaiid
, site.countyareaiidref
:area.areaiid
and site.mlraareaiidref
:area.areaiid
, respectively.
Most tables related to Area use the "default" relationship, but others, such as lmapunit
, also have a defined relationship to Area. The naming scheme for these relationships is consistent and easy to understand. lmapunit
follows the same scheme as in the site data where lmapunit_mlra
denotes relationship lmapunit.mlraareaiidref
:area.areaiid
for the dominant MLRA where that legend mapunit is used.
I think we can emulate what is done for these named relationships to the Area table for all cases where we need to bring in these related areasymbol columns.
Bringing it back to Soil Series, which doesn't have a named relationship to Area, naming the "Type Location State" areasymbol
column soilseries_typelocst
would be the analog to relationships for Site and Legend Mapunit.
Initial implementation of above for:
I started to look at get_projectmapunit_from_NASIS() and a few other cases, but have not implemented it there yet. For project it would be "mlra_sso"
and "nonmlra_ssa"
relationship names:
I noticed a couple of issues with the get_soilseries_from_NASIS() function today.
Cheers