Closed adamhsparks closed 3 years ago
I am able to knit the whole book. I pushed some changes that might have fixed it
I pulled. Tried to knit starting at 01 and got this far
Sent from my iPhone
On 2 Mar 2021, at 16:03, Paul Melloy notifications@github.com wrote:
I am able to knit the whole book. I pushed some changes that might have fixed it
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Which branch are you on? I don't have any issues on the master branch
I am/was on master when this issue occurred.
hmmm
I don't have any chunks names with bwplot
in 01_TrialsForInclusionInAnalysis.Rmd
in 02_preliminary analysis.rmd
there are 4 chunks with names starting with bwplot.
bwplot_yield
bwplot_sev
bwplot_yield_year
&
bwplot_yield_sched
no other files contain the string bwplot
This was not 01, this was 02 where it barfed.
Now it hangs at ~74% fetching BoM data.
that may just be my flakey connection here in South Perth tho
It should not need to fetch the data from bom if the data already exists in the weather/ directory, which is contains the predownloaded data.
I am wondering if this is a problem between mac and windows.
does the file.exists()
function work on macs?
Yes, that's a basic R function.
I ran the chunks up to this point then ran the code in this chunk step by step.
Have you made your cup of coffee yet ;)
yes, a few
Just wondering if you have a folder in the project directory weather/
yes
I am wondering if this might have something to do with the csv files starting with a -
sign
I was able to run all the chunks including the season_rainfall
chunk.
Is the macOS run on a UNIX shell ... ?
macOS is UNIX-like.
I might be able to run the chunk, will check again but the connection is poor here. However, I thought you said it was cached so it didn't run. If those CSV files (pictured) are the proper ones, why am I waiting on it to download the data again?
Yes have a look at the crop_rain
function it checks to see if the files exist before downloading the data
![Uploading image.png…]()
I might get it to edit out any -
from the lat long
I suspected it might cause issues when I wrote it
I have pushed the updated code so filenames are no longer saved with a -minus Did you want to try rerunning the code?
I left and went to get ☕️, came back, it was still running .5hr later.
I pulled your new code just now. It's still trying to fetch the data...
Does the add_lat_longs.R
function add the coordinates to columns in the PM_MB_dat
data.frame?
It might be that it can't find the correct lat and longs in the data
Otherwise it might be your connection, can you run this code?
get_historical(stationid = "041359",type = "rain")
I know there's some issues with BOM servers timing out, etc. right now from Perth. I worked on it over the weekend, but that's not the issue here.
The issue as I see it, is that the code does not work properly. I shouldn't even be running into the bomrang/BOM issue if the data were properly cached. It seems that they aren't.
So were the columns lat
and lon
added to the data.frame?
Perhaps it is loading the wrong lat and longs and therefore not finding the cached files?
> head(PM_MB_dat)
trial_ref year location host_genotype row_spacing
1 mung1011/01 2011 Hermitage Berken 0.75
2 mung1011/01 2011 Hermitage Berken 0.75
3 mung1011/01 2011 Hermitage Crystal 0.75
4 mung1011/01 2011 Hermitage Crystal 0.75
5 mung1011/02 2011 Kingaroy Berken 0.75
6 mung1011/02 2011 Kingaroy Berken 0.75
replicates planting_date first_sign_disease fungicide_ai
1 6 2011-01-24 2011-03-28 control
2 3 2011-01-24 2011-03-28 tebuconazole
3 6 2011-01-24 2011-03-28 control
4 3 2011-01-24 2011-03-28 tebuconazole
5 3 2011-02-02 2011-03-22 control
6 3 2011-02-02 2011-03-22 tebuconazole
dose_ai.ha total_fungicide harvest_date final_assessment
1 0.00 0 2011-04-25 2011-04-11
2 62.35 1 2011-04-25 2011-04-11
3 0.00 0 2011-04-25 2011-04-11
4 62.35 1 2011-04-25 2011-04-11
5 0.00 0 <NA> 2011-05-11
6 62.35 1 <NA> 2011-05-11
PM_final_severity disease_error D_error_type
1 4.667 0.5164 stdev
2 2.000 0.0000 stdev
3 5.000 0.0000 stdev
4 2.333 0.5774 stdev
5 8.000 0.0000 stdev
6 5.333 1.1547 stdev
grain_yield.t.ha. yield_error Y_error_type spray_management
1 1.6619 0.11732 stdev control
2 1.7693 0.19442 stdev Recommended
3 1.3876 0.12447 stdev control
4 1.5439 0.12833 stdev Recommended
5 0.7985 0.15104 stdev control
6 1.0586 0.08244 stdev Recommended
lat lon
1 -28.21 152.1
2 -28.21 152.1
3 -28.21 152.1
4 -28.21 152.1
5 -26.58 151.8
6 -26.58 151.8
Perhaps it is downloading again because of the number of digits in the floating point number. I'll specify that to in the function
where does it get the values from? that shouldn't happen.
See function add_lat_longs.R
# read in site location data, including lats and longs
experiment_sites <-
read.csv("cache/Mungbean_experiment_sites.csv", stringsAsFactors = FALSE)
If your computer is downloading the data it must be returning false to the following if() statement
crop_rain <- function(latitude, longitude, start, end) {
LL <- c(latitude, longitude)
if (file.exists(paste("weather/", str_remove(paste(LL, collapse = ""),"-"), ".csv", sep = "")))
Does this line of code return TRUE or FALSE for you?
file.exists(paste("weather/28.21500152.1003.csv"))
> file.exists(paste("weather/28.21500152.1003.csv"))
[1] TRUE
I pushed some changes to format the numbers to a consistent length before being saved as file names.
crop_rain <- function(latitude, longitude, start, end) {
# locations <- unique(data.frame(lat = latitude,
# lon = longitude))
LL <- format(round(as.numeric(c(latitude, longitude)),4), nsmall = 4)
Do you want to try it again?
Still fetching the data unless it just takes forever to import the cached data from the package?
It should take less than a minute.
yeah, nah. It's not importing the files. It's going to BOM or it's hanging importing files.
I am out of ideas, and not sure where to go from here given I can't reproduce it.
I should be able to pick up these files and knit them if it's truly reproducible. It's not.
I can't help anymore without more information. Perhaps we could organise a time to video chat and get to the bottom of this.
I'm spending more time on this than I should be, but one thing that I see is that you download weather for every single line of the data.frame. Why? Just download for one lat/lon value once. Rather than fetching 161x, it should be only 15x.
After that's done, I'm still trying to figure out why it doesn't load the cached data.
If it loads the cached data it should only download 7 times, one for each weather station that is most proximal to the trial locations
But if you are able to figure out why the if statement is returning FALSE on your machine that will solve the problem.
if (file.exists(paste("weather/", str_remove(paste(LL, collapse = ""),"-"), ".csv", sep = "")))
Your apply statement runs over every line of the data.frame. That's 161 lines that it runs over. I don't see how that's only 7 times?
because the file name does not exist...
Yes the apply function does run over every line of the data.frame. Not the most efficient, I know.
but for each line it checks if there is a weather csv for the weather station that is closest to the trial. If the csv exists it will read it in with read.csv
if not it downloads it from the web stores the data with file name of the lat and long of the station.
given there are multiple sites with multiple years it makes sense to only download it the minimum number of times
once per station closest to the trial (in some cases two sites might use the same station data).
> LL <- PM_MB_dat[1, 21:22]
> paste("weather/", str_remove(paste(LL, collapse = ""),"-"), ".csv", sep = "")
[1] "weather/28.215152.1003.csv"
28.2150152.1003.csv does exist but not 28.215152.1003.csv, which is what it's looking for
Yes, but it's much quicker/efficient to only look for the 15 unique lat/lon values to begin with.
I don't think the apply function is the problem here
I didn't look at any other lat/lon values.
I don't think so either. I'm just pointing out that I've found issues starting with the apply()
function in how this was coded. I was going back to the beginning of where the issue starts. It starts there. That may not be what's causing the issue, but it is still inefficient and an issue that I've found.
Try
LL <- format(round(PM_MB_dat[1, 21:22],4,nsmall = 4))
paste("weather/", str_remove(paste(LL, collapse = ""),"-"), ".csv", sep = "")
> LL <- format(round(PM_MB_dat[1, 21:22],4,nsmall = 4))
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'format': 3 arguments passed to 'round' which requires 1 or 2 arguments
> paste("weather/", str_remove(paste(LL, collapse = ""),"-"), ".csv", sep = "")
[1] "weather/28.215152.1003.csv"