pfmc-assessments / canary_2023

Other
5 stars 1 forks source link

warning about read_excel not reading in all data #59

Closed brianlangseth-NOAA closed 1 year ago

brianlangseth-NOAA commented 1 year ago

@okenk Note that using read_excel has the risk of not reading in all data entries within a column. Ive noticed it with the WA sport bds data and the OR mrfss bds data. It also occurs in the CA mrfss bds data. The issue is that cells with an entry are instead read in as NA. The WA sport data is more nefarious as it affects 100s of cells and affects the calculation of trips. For OR mrfss, this only happens with 4 cells. For CA mrfss, it occurs is tens of thousands cell but the vast majority are cells I dont use, though some occurin fields for defining trips. The issue appears to occur where there are few entries in a column that starts with at least 1000 blank cells. Can look at what columns these occur by using apply(dataset,2,FUN = function(x) {sum(is.na(x))})

My solution is to add guess_max = Inf within the read_excel function in the call. The default for guess_max is 1000. Using Inf results in all cells being read correctly.

An alternative solution is to convert all of the excel files to csv's and use read_csv.