Lengthy extraction - correctly setup? Progress bar possibility?

SimonDedman commented 4 years ago

Hi Roy, hope you're well. I'm wondering if you might lend your thoughts to a few selected issues I'm having? I'm trying to extract chlorophyll for ~39000 lat-lon points with the code below. This takes ages - the first subset (6922 points) took a few hours and ended with connection number (roughly) 12000, i.e. 1.9X the number of points. This may be due to automatic retries? The second took many hours, I went out for dinner, and returned to my machine having hung (probably unrelated, I think one of my RAM sticks is faulty). So question 1, is my approach to feeding rxtracto 3 vectors of x, y, t points inefficient for any reason? It works, but I'm wondering if I'm missing something that would work faster. It feels like requesting 6900 single value data from a server should be a quick process, but maybe the arrangement of requesting them one-by-one in individual http calls is the answer?

Question 2: while the verbose parameter is nice, I'm wondering if this would be more useful if logged to a file? The info flies past so quickly it's functionally impossible to read. This would also allow for:

Question 3: would it be possible to add a progress bar? Typical code for this is

Total <- 20
pb <- txtProgressBar(min = 0, max = total, style = 3) # create progress bar
for (i in 1:total) {   setTxtProgressBar(pb, i) # update progress bar
  #your code
} close(pb)

Question 4 / possible bug: I noticed that the STOP button in RStudio does nothing if pressed while a rextracto call is running.

Thanks in advance for your thoughts. The code I'm using:

urlbase <- "http://coastwatch.pfeg.noaa.gov/erddap/"
parameter <- 'chlorophyll'
xlen <- 0.1 
ylen <- 0.1
df_i$ChlA <- rep(NA, nrow(df_i)) # add NA chlA
df_i <- dplyr::arrange(df_i, Date)# order df_i by date
datespre <- which(df_i$Date >= "1997-09-02" & df_i$Date < "2003-01-05" & !is.na(df_i$lat) & !is.na(df_i$lon))
datespost <- which(df_i$Date >= "2003-01-05" & !is.na(df_i$lat) & !is.na(df_i$lon))
dataset <- 'erdSW2018chla8day' # 1997-09-02T00:00:00Z, 2010-12-15T00:00:00Z
dataInfo <- rerddap::info(dataset, url = urlbase)
rerddap::cache_delete_all(force = TRUE)
chl_pre <- rxtracto(dataInfo,
                    parameter = parameter,
                    xcoord = df_i[datespre,"lon"],
                    ycoord = df_i[datespre,"lat"],
                    tcoord = df_i[datespre,"Date"],
                    xlen = xlen,
                    ylen = ylen,
                    verbose = TRUE)
df_i[datespre,"ChlA"] <- chl_pre$`mean chlorophyll`
dataset <- 'erdMH1chla8day' # 2003-01-05T00:00:00Z, 2019-04-27T00:00:00Z
dataInfo <- rerddap::info(dataset, url = urlbase)
rerddap::cache_delete_all(force = TRUE)
chl_post <- rxtracto(dataInfo,
                     parameter = parameter,
                     xcoord = df_i[datespost,"lon"],
                     ycoord = df_i[datespost,"lat"],
                     tcoord = df_i[datespost,"Date"],
                     xlen = xlen,
                     ylen = ylen,
                     verbose = TRUE)
df_i[datespost,"ChlA"] <- chl_pre$`mean chlorophyll`