ropensci-archive / isdparser

:warning: ARCHIVED :warning: NOAA ISD/ISH file parser
Other
14 stars 4 forks source link

metadata on each data field #12

Closed sckott closed 6 years ago

sckott commented 7 years ago

via #6

working on scraping this data from the pdf manual, very messy indeed, and will take some manual clean up i think - file here https://github.com/ropensci/isdparser/blob/master/inst/metadata/scrape_docs.R

data.frame so far looks like:

# A tibble: 642 × 8
     pos field_length      min      max           units scaling_factor missing
   <chr>        <chr>    <chr>    <chr>           <chr>          <chr>   <chr>
1    1-4         <NA>     0000     9999            <NA>           <NA>    <NA>
2   5-10         <NA>     <NA>     <NA>            <NA>           <NA>    <NA>
3  11-15         <NA>    00000    99999            <NA>           <NA>    <NA>
4  16-23         <NA> 00000101 99991231            <NA>           <NA>    <NA>
5  24-27         <NA>     0000     2359            <NA>           <NA>    <NA>
6  28-28         <NA>        1     <NA>            <NA>           <NA>       9
7  29-34         <NA>   -90000   +90000 Angular Degrees           1000  +99999
8  35-41         <NA>  -179999  +180000 Angular Degrees           1000 +999999
9  42-46         <NA>     <NA>     <NA>            <NA>           <NA>   99999
10 47-51         <NA>    -0400    +8850          Meters              1   +9999
# ... with 632 more rows, and 1 more variables: dom <chr>
sckott commented 7 years ago

data transformations that can be done with the metadata:

  1. scale - w/ variable scaling_numeric, e.g. x * 1000 or x * 10
  2. replace missing with user defined value? - w/ variable missing_numeric e.g., w/ NA
  3. replace categorical variables with text value of the category - w/ variable dom_parsed - will be a named list with names as the category labels and the values the values
sckott commented 7 years ago

@rjbehnke let me know if you see any problems in the metadata

https://github.com/ropensci/isdparser/blob/master/inst/extdata/isd_metadata.csv

doing some checking still to make sure no problems

rjbehnke commented 7 years ago

I just checked field lengths, and they seem good.


From: Scott Chamberlain [notifications@github.com] Sent: Tuesday, January 31, 2017 10:10 AM To: ropensci/isdparser Cc: Behnke, Ruben; Mention Subject: Re: [ropensci/isdparser] metadata on each data field (#12)

@rjbehnkehttps://github.com/rjbehnke let me know if you see any problems in the metadata

https://github.com/ropensci/isdparser/blob/master/inst/extdata/isd_metadata.csv

doing some checking still to make sure no problems

� You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/isdparser/issues/12#issuecomment-276426823, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU27gYc-UYI1N1zO4dFPeM3H5Y2YCCks5rX2r0gaJpZM4LPtC0.

rjbehnke commented 7 years ago

I can't find the thread where this issue should be placed, but elevation has a scale factor of 1. I believe your isd_transform function is using a scale factor of 10 for elevation.

Ruben


From: Behnke, Ruben Sent: Tuesday, January 31, 2017 10:57 AM To: ropensci/isdparser; ropensci/isdparser Cc: Mention Subject: RE: [ropensci/isdparser] metadata on each data field (#12)

I just checked field lengths, and they seem good.


From: Scott Chamberlain [notifications@github.com] Sent: Tuesday, January 31, 2017 10:10 AM To: ropensci/isdparser Cc: Behnke, Ruben; Mention Subject: Re: [ropensci/isdparser] metadata on each data field (#12)

@rjbehnkehttps://github.com/rjbehnke let me know if you see any problems in the metadata

https://github.com/ropensci/isdparser/blob/master/inst/extdata/isd_metadata.csv

doing some checking still to make sure no problems

� You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropensci/isdparser/issues/12#issuecomment-276426823, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU27gYc-UYI1N1zO4dFPeM3H5Y2YCCks5rX2r0gaJpZM4LPtC0.

sckott commented 7 years ago

@rjbehnke thanks. it is indeed 1 for elevation - I'll change that.

ideally, once this metadata is verified, we'll use it, and all scaling/etc. info will come from it