ncss-tech / soilDB

soilDB: Simplified Access to National Cooperative Soil Survey Databases
http://ncss-tech.github.io/soilDB/
GNU General Public License v3.0
83 stars 19 forks source link

`fetchKSSL(returnGeochemicalData=TRUE)`: `pedlabsampnum` based queries return "FALSE" labsampnum for pedons without data #215

Closed brownag closed 3 years ago

brownag commented 3 years ago

If a particular pedlabsampnum has NULL geochemical data, and the fetchKSSL pedlabsampnum interface is used dummy records with bad labsampnum ( "FALSE") are included in the result.

library(soilDB)
one <- fetchKSSL(pedlabsampnum = "00P0788", returnGeochemicalData = TRUE)
#> 1 pedons loaded (0.07 Mb transferred)
two <- fetchKSSL(pedlabsampnum = "05N0025", returnGeochemicalData = TRUE)
#> 1 pedons loaded (0.04 Mb transferred)
one$optical$labsampnum
#> [1] "00P04938" "00P04940"
two$optical$labsampnum
#> Error in two$optical$labsampnum: $ operator is invalid for atomic vectors
two$optical
#>       [,1]
#> [1,] FALSE
three <- fetchKSSL(pedlabsampnum = c("00P0788", "05N0025"), 
                   returnGeochemicalData = TRUE)
#>   |                                                                              |                                                                      |   0%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================================================| 100%
#> 2 pedons loaded (0.11 Mb transferred)
three$optical$labsampnum
#> [1] "00P04938" "00P04940" "FALSE"
three$optical[3,]
#>   labsampnum result_source_key prep_code analyzed_size_frac glass_count_method
#> 3      FALSE                 0     FALSE              FALSE              FALSE
#>   bg_basic_glass_count di_diatoms_glass_count ga_glass_aggregates_glass_count
#> 3                FALSE                  FALSE                           FALSE
#>   gs_glass_glass_count fg_glass_coated_feldspar_glass_count
#> 3                FALSE                                FALSE
#>   gc_glass_coated_grain_glass_count hg_glass_coated_hornblende_glass_count
#> 3                             FALSE                                  FALSE
#>   og_glass_coated_opaques_glass_count qg_glass_coated_quartz_glass_count
#> 3                               FALSE                              FALSE
#>   gm_glassy_materials_glass_count ot_other_glass_count
#> 3                           FALSE                FALSE
#>   po_plant_opal_glass_count ss_sponge_spicule_glass_count
#> 3                     FALSE                         FALSE
#>   ar_weatherable_aggregates_glass_count pa_palagonite_glass_count
#> 3                                 FALSE                     FALSE
#>   pm_pumice_glass_count petro_count_method ac_actinolite_petro_count
#> 3                 FALSE              FALSE                     FALSE
#>   fb_albite_petro_count am_amphibole_petro_count ae_anatase_petro_count
#> 3                 FALSE                    FALSE                  FALSE
#>   an_andalusite_petro_count fa_andesite_petro_count ay_anhydrite_petro_count
#> 3                     FALSE                   FALSE                    FALSE
#>   fn_anorthite_petro_count fh_anorthoclase_petro_count
#> 3                    FALSE                       FALSE
#>   ah_anthophyllite_petro_count ag_antigorite_petro_count ap_apatite_petro_count
#> 3                        FALSE                     FALSE                  FALSE
#>   ao_aragonite_petro_count af_arfvedsonite_petro_count au_augite_petro_count
#> 3                    FALSE                       FALSE                 FALSE
#>   ba_barite_petro_count bg_basic_glass_petro_count by_beryl_petro_count
#> 3                 FALSE                      FALSE                FALSE
#>   bt_biotite_petro_count bc_biotite_chlorite_petro_count be_bohmite_petro_count
#> 3                      0                           FALSE                  FALSE
#>   bz_bronzite_petro_count bk_brookite_petro_count br_brucite_petro_count
#> 3                   FALSE                   FALSE                  FALSE
#>   ca_calcite_petro_count cb_carbonate_aggregates_petro_count
#> 3                  FALSE                               FALSE
#>   ct_cassiterite_petro_count cd_chert_chalcedony_jasper_agate_onyx_petro_count
#> 3                      FALSE                                             FALSE
#>   cl_chlorite_petro_count cm_chlorite_mica_petro_count
#> 3                   FALSE                        FALSE
#>   cy_chrysotile_petro_count qc_clay_coated_quartz_petro_count
#> 3                     FALSE                             FALSE
#>   ch_cliachite_bauxite_petro_count cz_clinozoisite_petro_count
#> 3                            FALSE                       FALSE
#>   cc_coal_petro_count co_collophane_petro_count cn_corundum_petro_count
#> 3               FALSE                     FALSE                   FALSE
#>   cr_cristobalite_petro_count di_diatoms_petro_count dp_diopside_petro_count
#> 3                       FALSE                  FALSE                   FALSE
#>   dl_dolomite_petro_count du_dumortierite_petro_count en_enstatite_petro_count
#> 3                   FALSE                       FALSE                    FALSE
#>   ep_epidote_petro_count fd_feldspar_petro_count fz_feldspathoids_petro_count
#> 3                  FALSE                   FALSE                        FALSE
#>   fm_ferromagnesium_petro_count fu_fluorite_petro_count
#> 3                         FALSE                   FALSE
#>   ff_foraminifera_petro_count gg_galena_petro_count gn_garnet_petro_count
#> 3                       FALSE                 FALSE                 FALSE
#>   gi_gibbsite_petro_count ga_glass_aggregates_petro_count
#> 3                   FALSE                           FALSE
#>   fg_glass_coated_feldspar_petro_count gc_glass_coated_grain_petro_count
#> 3                                FALSE                             FALSE
#>   hg_glass_coated_hornblende_petro_count og_glass_coated_opaque_petro_count
#> 3                                  FALSE                              FALSE
#>   qg_glass_coated_quartz_petro_count gs_glass_petro_count
#> 3                              FALSE                    0
#>   gm_glassy_matrials_petro_count gl_glauconite_petro_count
#> 3                          FALSE                     FALSE
#>   go_glaucophane_petro_count ge_goethite_petro_count gd_gold_petro_count
#> 3                      FALSE                   FALSE               FALSE
#>   gy_gypsum_petro_count kh_halloysite_petro_count he_hematite_petro_count
#> 3                 FALSE                     FALSE                       0
#>   hn_hornblende_petro_count hb_hydrobiotite_petro_count
#> 3                         0                       FALSE
#>   id_iddingsite_petro_count qi_iron_oxide_coated_quartz_petro_count
#> 3                     FALSE                                   FALSE
#>   fe_iron_oxides_geothite_magnetite_hematite_li_petro_count
#> 3                                                         0
#>   jo_jarosite_petro_count kk_kaolinite_petro_count ky_kyanite_petro_count
#> 3                   FALSE                    FALSE                  FALSE
#>   fl_labradorite_petro_count la_lamprobolite_petro_count
#> 3                      FALSE                       FALSE
#>   lp_lepidolite_petro_count lo_lepidomelane_petro_count
#> 3                     FALSE                       FALSE
#>   lu_leucoxene_petro_count lm_limonite_petro_count lt_lithiophorite_petro_count
#> 3                    FALSE                   FALSE                        FALSE
#>   me_magnesite_petro_count mg_magnetite_petro_count mr_marcasite_petro_count
#> 3                    FALSE                        0                    FALSE
#>   ml_melilite_petro_count mi_mica_petro_count fc_microcline_petro_count
#> 3                   FALSE               FALSE                     FALSE
#>   mz_monazite_petro_count mt_montmorillonite_petro_count
#> 3                   FALSE                          FALSE
#>   ms_muscovite_petro_coun ne_nepheline_petro_count
#> 3                   FALSE                    FALSE
#>   nx_non_crystalline_petro_count fo_oligoclase_petro_count
#> 3                          FALSE                     FALSE
#>   ov_olivine_petro_count op_opaques_petro_count fr_orthoclase_petro_count
#> 3                  FALSE                      0                     FALSE
#>   or_other_resistant_minerals_petro_count
#> 3                                   FALSE
#>   ow_other_weatherable_minerals_petro_count ot_other_petro_count
#> 3                                     FALSE                FALSE
#>   pk_perovskite_petro_count pl_phlogophit_petro_count pd_piemontite_petro_count
#> 3                     FALSE                     FALSE                     FALSE
#>   fp_plagioclase_feldspar_petro_count po_plant_opal_petro_count
#> 3                                   0                     FALSE
#>   pj_plumbojarosite_petro_count pn_pollen_petro_count
#> 3                         FALSE                 FALSE
#>   fk_potassium_feldspar_petro_count pi_pyrite_petro_count
#> 3                                 0                 FALSE
#>   pu_pyrolusite_petro_count py_pyrophyllite_petro_count pr_pyroxene_petro_count
#> 3                     FALSE                       FALSE                       0
#>   qz_quartz_petro_count ra_resistant_aggregates_petro_count
#> 3                     0                               FALSE
#>   md_resistant_mineraloids_petro_count re_resistant_minerals_petro_count
#> 3                                FALSE                             FALSE
#>   ro_rhodochrosite_petro_count rb_riebeckite_blue_amphibole_petro_count
#> 3                        FALSE                                    FALSE
#>   ru_rutile_petro_count fs_sanidine_petro_count sr_sericite_petro_count
#> 3                 FALSE                   FALSE                   FALSE
#>   si_siderite_petro_count sa_siliceous_aggregates_petro_count
#> 3                   FALSE                               FALSE
#>   sl_sillimanite_petro_count sg_sphalerite_petro_count sp_sphene_petro_count
#> 3                      FALSE                     FALSE                 FALSE
#>   sn_spinel_petro_count ss_sponge_spicule_petro_count so_staurolite_petro_count
#> 3                 FALSE                         FALSE                     FALSE
#>   st_stilbite_petro_count su_sulfur_petro_count ta_talc_petro_count
#> 3                   FALSE                 FALSE               FALSE
#>   tp_topaz_petro_count tm_tourmaline_petro_count te_tremolite_petro_count
#> 3                FALSE                         0                    FALSE
#>   vr_vermiculite_petro_count vc_vermiculite_chlorite_petro_count
#> 3                      FALSE                               FALSE
#>   vh_vermiculite_hydrobiotite_petro_count vm_vermiculite_mica_petro_count
#> 3                                   FALSE                               0
#>   vi_vivianite_petro_count wv_wavellite_petro_count
#> 3                    FALSE                    FALSE
#>   ar_weatherable_aggregates_petro_count we_weatherable_mineral_petro_count
#> 3                                 FALSE                              FALSE
#>   ze_zeolite_petro_count zr_zircon_petro_count zo_zoisite_petro_count
#> 3                  FALSE                     0                  FALSE
#>   ai_aegirine_augite_petro_count al_allophane_petro_count
#> 3                          FALSE                    FALSE
#>   ce_cobaltite_petro_count ha_halite_petro_count
#> 3                    FALSE                 FALSE
#>   il_illite_hydromuscovite_petro_count lc_analcime_petro_count
#> 3                                FALSE                   FALSE
#>   le_lepidocrocite_petro_count li_leucite_petro_count
#> 3                        FALSE                  FALSE
#>   mc_montmorillonite_chlorite_petro_count mh_maghemite_petro_count
#> 3                                   FALSE                    FALSE
#>   mm_montmorillonite_mica_petro_count
#> 3                               FALSE
#>   mv_montmorillonite_vermiculite_petro_count pa_palagonite_petro_count
#> 3                                      FALSE                     FALSE
#>   pg_palygorskite_petro_count sc_scapolite_petro_count se_sepiolite_petro_count
#> 3                       FALSE                    FALSE                    FALSE
#>   sm_smectite_petro_count td_tridymite_petro_count th_thenardite_petro_count
#> 3                   FALSE                    FALSE                     FALSE
#>   hy_hypersthene_petro_count hs_hydroxy_interlayer_smectite_petro_count
#> 3                      FALSE                                      FALSE
#>   hv_hydroxy_interlayer_vermiculite_petro_count pm_pumice_petro_count
#> 3                                         FALSE                 FALSE
#>   sz_serpentine_petro_count total_grains_counted
#> 3                     FALSE                    0
#>   resistant_minerals_total_mineral_soil glass_count_mineral_interpretation
#> 3                                     0                              FALSE

Traces back to this https://github.com/ncss-tech/soilDB/blob/6e1750b1ce9506d617e8e84a581dd946a642d80c/R/fetchKSSL.R#L386-L389 which will "fill" in data (FALSE and 0) when rbinding the 2 row data.frame from first pedon witdata with the 1x1 logical matrix that comes out of the second pedon without data

I think these records should be filtered out, or alternately some sort of NA-filled record should be provided for each labsampnum.

dylanbeaudette commented 3 years ago

I think that I have a simple solution, using data.table::rbindlist. Seems to work, please test filter_geochem. I'm not sure, but it seems to work as well.

brownag commented 3 years ago

I think the rbindlist is a partial solution, thanks.

From what I can see this only NA-fills in the profile labsampnums that have no data with NA -- which is good, but the same is not done for profiles that have only some of their horizons measured (very common for geochemical data).

I think I would prefer my latter option here that the various data.frame objects returned with returnGeochemicalData =TRUE have all of the labsampnum from the related SPC, which means cross referencing the contents of the combined SPC to each of the result tables. I dont think any of these changes should affect filter_geochem

dylanbeaudette commented 3 years ago

OK, I see. Two options here:

  1. refactor the API to return all of the affected labsampnum (LEFT JOIN) even when all of the data might be NA. That would greatly simplify post-processing in R, but introduce a lot of "newness" in the API. Also, it adds a lot of bloat to the data being pushed around the wire.
  2. leave the API alone and clean-up in R. I would like this to be kept to a minimum, such that NA-padding is done via left joins on demand.

I'm going to try option 2 first (0-length data.frame with labsampnum column), as I have a major re-write of the API on my TODO list for next month.