ncss-tech / lab-data-delivery

Re-working the LIMS + NASIS data integration process for snapshot and future web-service based delivery.
6 stars 1 forks source link

automated translation of FGDB to SQLite or related #2

Closed dylanbeaudette closed 7 years ago

dylanbeaudette commented 7 years ago

It would be nice to post both ESRI-based and open-source alternatives of the NASIS/KSSL data.

dylanbeaudette commented 7 years ago

Here is an outline of how it might work:

  1. check text files for formatting errors
  2. export tables from FGDB to text
  3. generate SQLite schema from first 1,000 lines of each file
  4. create SQLite DBs from schemas (NASIS pedons, KSSL morp, KSSL lab data)
  5. load SQLite DBs with raw data
  6. add basic indexing to SQLite DBs
  7. review
  8. post to national www page

Much of this can be done in R with:

Initial checking of the NASIS pedons, suggests that the files are nearly free of errors:

      MetadataTable.csv MetadataTableColumn.csv              phtext.csv 
                     55                     434                       4 

The above errors are likely related to newline and double-quote characters. I'll post back with the details.

Related code here.

dylanbeaudette commented 7 years ago

One small issue, in the text files field names appear to be truncated:

The field names are not truncated in the FGDB.

dylanbeaudette commented 7 years ago

Very close to a working prototype. Loading the KSSL lab data into a new SQLite database, 330MB, takes about 30 seconds (5041efc4af612baafd7df15361718498002dacf3). The resulting table definitions seems reasonable, apart from the truncated field names

CREATE TABLE Andic_Soil_Properties 
( `labsampnum` TEXT,
    `result_source_ke` INTEGER,
    `prep_code` TEXT,
    `clay_tot` REAL,
    `clay_f` REAL,
    `clay_tot_est` REAL,
    `co3_cly` REAL,
    `cosi_vcos` REAL,
    `wf_0175` INTEGER,
    `vfg2` INTEGER,
    `w15l2` REAL,
    `w15cly` REAL,
    `db_13b` REAL,
    `COLEws` REAL,
    `glass_content_to` INTEGER,
    `glass_content__1` INTEGER,
    `glass_content__2` INTEGER,
    `glass_content__3` INTEGER,
    `glass_content__4` INTEGER,
    `glass_content__5` TEXT,
    `glass_content_wt` INTEGER,
    `alfe15_glass` INTEGER,
    `alfe60_glass` INTEGER,
    `ph_h2o` REAL,
    `ph_naf` REAL,
    `org_c` REAL,
    `c_tot` REAL,
    `caco3` INTEGER,
    `p_nz` INTEGER,
    `al_fe_ox` REAL,
    `sitimes8_ox` INTEGER,
    `fetimes2_ox` INTEGER,
    `si8plusfe2_ox` INTEGER,
    `optd_ox` REAL,
    `al_kcl` REAL,
    `base_sum_cec` REAL,
    `ecec_cec` REAL,
    `bsesat_cec` INTEGER,
    `bsecat_cec` INTEGER,
    `Melanic` REAL,
    `abgll` TEXT,
    `abgpi` TEXT 
)
...
jneme910 commented 7 years ago

Should we move this into the pedon tools? Combine the two on GitHub

dylanbeaudette commented 7 years ago

The code is functional and tested. I'll close this for now and work on polishing the code. Field name truncation gets a new issue.