ropensci / rnoaa

R interface to many NOAA data APIs
https://docs.ropensci.org/rnoaa
Other
330 stars 84 forks source link

Data I receive from NCDC is in Tibble format - Unusable #365

Closed Brian-160 closed 4 years ago

Brian-160 commented 4 years ago

I am attempting to produce a usable climate database. Unfortunately, when I download the data it is very un-'tidy'. I attempted to use pivot_longer from the tidyverse package but received an error saying 'incorrect dimensions'. This is how my data-frame looks like in R Studio:

$data
# A tibble: 935 x 8

   date               datatype  station       value   fl_m    fl_q     fl_so    fl_t 

   <chr>            <chr>    <chr>           <int> <chr> <chr> <chr> <chr>

 1 2020-01-01T00:0~ PRCP     GHCND:USW00024~    76 ""    ""    W     "240~
 2 2020-01-01T00:0~ SNOW     GHCND:USW00024~     0 "T"   ""    W     ""   
 3 2020-01-01T00:0~ SNWD     GHCND:USW00024~     0 "T"   ""    W     ""   

I have tried everything, I only want to keep columns 1:2 4:5:

mso_light <- mso_data[1:2, 4:5]
# Error in mso_data[1:2, 4:5] : incorrect number of dimensions

Another attempt:

keeps <- c("date", "datatype", "value", "fl_m")
mso_light <- mso_data[keeps]

(this nulled the entire data frame)

When I use the function 'colnames()' I get an error message saying there are no column names. I will also post my code for downloading data, am I bringing the Tibble on myself or is this the data format NCDC uses?

library('rnoaa')
library('dplyr')
library('utils')
library('cgwtools')

data_type <- c('tmax','tmin','PRCP', 'SNOW', 'SNWD')

for (i in 2009:2019){
  start_date <- paste(i, '-01-01', sep = "")
  end_date <- paste(i, '-12-31', sep = "")
  assign(paste('mso_data', i, sep = ""), ncdc(datasetid = 'GHCND', stationid = 'GHCND:USW00024153',
             datatypeid = data_type, startdate = start_date, 
             enddate = end_date, limit = 1000))
  a <- paste('mso_data', i, sep = "")

  if (i == 1948){
    save(a, file = 'mso_data.RData')
  }
  else {
    resave(a, file = 'mso_data.RData')
  }
}

mso_data <- ncdc(datasetid = 'GHCND', stationid = 'GHCND:USW00024153',
                 datatypeid = data_type, startdate = '2020-01-01', 
                 enddate = '2020-07-07', limit = 1000)
resave(mso_data, file = 'mso_data.RData')

Thanks for any help, Brian.

sckott commented 4 years ago

thanks for your question! In the future include your session info please.

The output of ncdc() is a list. So you can't index like [x,y] on a list, e.g, try

x <- list(1, 2, 3)
x[1,2]
#> Error in x[1, 2] : incorrect number of dimensions

The data as a data.frame is in the $data slot, so this should work

mso_data$data[1:2, 4:5]
Brian-160 commented 4 years ago

Thank you so much, disregard my previous email. I actually tried something similar but I think I forgot the '[1:2, 4:5]'

Worked perfectly, thanks again.

Brian

On Fri, Jul 10, 2020 at 2:50 PM Scott Chamberlain notifications@github.com wrote:

thanks for your question! In the future include your session info please.

The output of ncdc() is a list. So you can't index like [x,y] on a list, e.g, try

x <- list(1, 2, 3)x[1,2]#> Error in x[1, 2] : incorrect number of dimensions

The data as a data.frame is in the $data slot, so this should work

mso_data$data[1:2, 4:5]

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ropensci/rnoaa/issues/365#issuecomment-656883783, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQHY5OPRKR5QNXQDRX5RQQ3R255HJANCNFSM4OW5B6OQ .

sckott commented 4 years ago

glad it worked!