Closed dylanbeaudette closed 3 years ago
Possible solution:
peiid
to extended_data$diagHzBoolean
NA
with FALSE
@site
Grave digging a bit here while doing some fetchNASIS
work for https://github.com/ncss-tech/soilDB/pull/149.
As Dylan pointed out NA
are introduced from join of a null right-hand side (no diagnostic records for particular peiid
) to the pedon level where all peiid
are present.
I think I would shy away from filling NA
as FALSE
as proposed above because of how spotty the diagnostics table can be depending on data vintage/origin/purpose.
If folks agree with my interpretations on following points, I think this issue can be closed.
For "validation" purposes it may be valuable to know which pedon diagnostics were NULL
(converted to NA
) versus pedons where some diagnostics were populated just not that type (FALSE
).
Any calculated values are an "improvement" over nothing, but the interpretation of those calculated values can and should vary.
NULL
input v.s. definitive FALSE
seems wise.Filtering around NA
is covered by methods like subset,SoilProfileCollection-method
As an aside, @smroecker had proposed a refactor of this function in https://github.com/ncss-tech/soilDB/issues/158. That might be something to strongly consider for all "extended" data sources in near future / following merge of #149
I would agree with your comments, Andrew. I think it is good to be able to make the distinction between NULL input vs. definitive FALSE. We will need to retain that kind of information for future gap filling diagnostics and workflows.
Works for me. Another line of reasoning: all pedons should have at least 1 diagnostic horizon / feature record, the complete absence suggests a data population error.
It appears that
.diagHzLongtoWide()
assigns NA to all columns when a pedon has no diagnostic features. NA should probably beFALSE
, since we interpret NULL records as missing for all other interpretation of diagnostic feature records.Probably related to joining a limited set of
peiid
records to the full set in@site
.