openplantpathology / Mungbean_PM

A meta-analysis of mungbean powdery mildew control fungicide efficacy trials
https://openplantpathology.github.io/Mungbean_PM/
Creative Commons Zero v1.0 Universal
1 stars 0 forks source link

02_Preliminary_analysis won't knit #43

Closed adamhsparks closed 3 years ago

adamhsparks commented 3 years ago
Duplicate chunk label 'bwplot', which has been used for the chunk:
PaulMelloy commented 3 years ago

I am able to knit the whole book. I pushed some changes that might have fixed it

adamhsparks commented 3 years ago

I pulled. Tried to knit starting at 01 and got this far

Sent from my iPhone

On 2 Mar 2021, at 16:03, Paul Melloy notifications@github.com wrote:

 I am able to knit the whole book. I pushed some changes that might have fixed it

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

PaulMelloy commented 3 years ago

Which branch are you on? I don't have any issues on the master branch

adamhsparks commented 3 years ago

I am/was on master when this issue occurred.

PaulMelloy commented 3 years ago

hmmm I don't have any chunks names with bwplot in 01_TrialsForInclusionInAnalysis.Rmd in 02_preliminary analysis.rmd there are 4 chunks with names starting with bwplot.

bwplot_yield bwplot_sev bwplot_yield_year & bwplot_yield_sched

no other files contain the string bwplot

adamhsparks commented 3 years ago

This was not 01, this was 02 where it barfed.

Now it hangs at ~74% fetching BoM data.

adamhsparks commented 3 years ago

that may just be my flakey connection here in South Perth tho

PaulMelloy commented 3 years ago

It should not need to fetch the data from bom if the data already exists in the weather/ directory, which is contains the predownloaded data.

I am wondering if this is a problem between mac and windows. does the file.exists() function work on macs?

adamhsparks commented 3 years ago

Yes, that's a basic R function.

adamhsparks commented 3 years ago

I ran the chunks up to this point then ran the code in this chunk step by step.

Screen Shot 2021-03-03 at 9 34 40 am
PaulMelloy commented 3 years ago

Have you made your cup of coffee yet ;)

adamhsparks commented 3 years ago

yes, a few

PaulMelloy commented 3 years ago

Just wondering if you have a folder in the project directory weather/

adamhsparks commented 3 years ago

yes

adamhsparks commented 3 years ago
Screen Shot 2021-03-03 at 9 51 10 am
PaulMelloy commented 3 years ago

I am wondering if this might have something to do with the csv files starting with a - sign

PaulMelloy commented 3 years ago

I was able to run all the chunks including the season_rainfall chunk. Is the macOS run on a UNIX shell ... ?

adamhsparks commented 3 years ago

macOS is UNIX-like.

I might be able to run the chunk, will check again but the connection is poor here. However, I thought you said it was cached so it didn't run. If those CSV files (pictured) are the proper ones, why am I waiting on it to download the data again?

PaulMelloy commented 3 years ago

Yes have a look at the crop_rain function it checks to see if the files exist before downloading the data ![Uploading image.png…]()

PaulMelloy commented 3 years ago

I might get it to edit out any - from the lat long I suspected it might cause issues when I wrote it

PaulMelloy commented 3 years ago

I have pushed the updated code so filenames are no longer saved with a -minus Did you want to try rerunning the code?

adamhsparks commented 3 years ago

I left and went to get ☕️, came back, it was still running .5hr later.

I pulled your new code just now. It's still trying to fetch the data...

PaulMelloy commented 3 years ago

Does the add_lat_longs.R function add the coordinates to columns in the PM_MB_dat data.frame? It might be that it can't find the correct lat and longs in the data

Otherwise it might be your connection, can you run this code? get_historical(stationid = "041359",type = "rain")

adamhsparks commented 3 years ago

I know there's some issues with BOM servers timing out, etc. right now from Perth. I worked on it over the weekend, but that's not the issue here.

The issue as I see it, is that the code does not work properly. I shouldn't even be running into the bomrang/BOM issue if the data were properly cached. It seems that they aren't.

PaulMelloy commented 3 years ago

So were the columns lat and lon added to the data.frame? Perhaps it is loading the wrong lat and longs and therefore not finding the cached files? image

adamhsparks commented 3 years ago
> head(PM_MB_dat)
    trial_ref year  location host_genotype row_spacing
1 mung1011/01 2011 Hermitage        Berken        0.75
2 mung1011/01 2011 Hermitage        Berken        0.75
3 mung1011/01 2011 Hermitage       Crystal        0.75
4 mung1011/01 2011 Hermitage       Crystal        0.75
5 mung1011/02 2011  Kingaroy        Berken        0.75
6 mung1011/02 2011  Kingaroy        Berken        0.75
  replicates planting_date first_sign_disease fungicide_ai
1          6    2011-01-24         2011-03-28      control
2          3    2011-01-24         2011-03-28 tebuconazole
3          6    2011-01-24         2011-03-28      control
4          3    2011-01-24         2011-03-28 tebuconazole
5          3    2011-02-02         2011-03-22      control
6          3    2011-02-02         2011-03-22 tebuconazole
  dose_ai.ha total_fungicide harvest_date final_assessment
1       0.00               0   2011-04-25       2011-04-11
2      62.35               1   2011-04-25       2011-04-11
3       0.00               0   2011-04-25       2011-04-11
4      62.35               1   2011-04-25       2011-04-11
5       0.00               0         <NA>       2011-05-11
6      62.35               1         <NA>       2011-05-11
  PM_final_severity disease_error D_error_type
1             4.667        0.5164        stdev
2             2.000        0.0000        stdev
3             5.000        0.0000        stdev
4             2.333        0.5774        stdev
5             8.000        0.0000        stdev
6             5.333        1.1547        stdev
  grain_yield.t.ha. yield_error Y_error_type spray_management
1            1.6619     0.11732        stdev          control
2            1.7693     0.19442        stdev      Recommended
3            1.3876     0.12447        stdev          control
4            1.5439     0.12833        stdev      Recommended
5            0.7985     0.15104        stdev          control
6            1.0586     0.08244        stdev      Recommended
     lat   lon
1 -28.21 152.1
2 -28.21 152.1
3 -28.21 152.1
4 -28.21 152.1
5 -26.58 151.8
6 -26.58 151.8
PaulMelloy commented 3 years ago

Perhaps it is downloading again because of the number of digits in the floating point number. I'll specify that to in the function

adamhsparks commented 3 years ago

where does it get the values from? that shouldn't happen.

PaulMelloy commented 3 years ago

See function add_lat_longs.R

# read in site location data, including lats and longs
experiment_sites <-
   read.csv("cache/Mungbean_experiment_sites.csv", stringsAsFactors = FALSE)

If your computer is downloading the data it must be returning false to the following if() statement

crop_rain <- function(latitude, longitude, start, end) {

   LL <- c(latitude, longitude)

      if (file.exists(paste("weather/", str_remove(paste(LL, collapse = ""),"-"), ".csv", sep = "")))

Does this line of code return TRUE or FALSE for you? file.exists(paste("weather/28.21500152.1003.csv"))

adamhsparks commented 3 years ago
> file.exists(paste("weather/28.21500152.1003.csv"))
[1] TRUE
PaulMelloy commented 3 years ago

I pushed some changes to format the numbers to a consistent length before being saved as file names.

crop_rain <- function(latitude, longitude, start, end) {
   # locations <- unique(data.frame(lat = latitude,
   #                                lon = longitude))

   LL <- format(round(as.numeric(c(latitude, longitude)),4), nsmall = 4)

Do you want to try it again?

adamhsparks commented 3 years ago

Still fetching the data unless it just takes forever to import the cached data from the package?

PaulMelloy commented 3 years ago

It should take less than a minute.

adamhsparks commented 3 years ago

yeah, nah. It's not importing the files. It's going to BOM or it's hanging importing files.

PaulMelloy commented 3 years ago

I am out of ideas, and not sure where to go from here given I can't reproduce it.

adamhsparks commented 3 years ago

I should be able to pick up these files and knit them if it's truly reproducible. It's not.

PaulMelloy commented 3 years ago

I can't help anymore without more information. Perhaps we could organise a time to video chat and get to the bottom of this.

adamhsparks commented 3 years ago

I'm spending more time on this than I should be, but one thing that I see is that you download weather for every single line of the data.frame. Why? Just download for one lat/lon value once. Rather than fetching 161x, it should be only 15x.

After that's done, I'm still trying to figure out why it doesn't load the cached data.

PaulMelloy commented 3 years ago

If it loads the cached data it should only download 7 times, one for each weather station that is most proximal to the trial locations

PaulMelloy commented 3 years ago

But if you are able to figure out why the if statement is returning FALSE on your machine that will solve the problem.

if (file.exists(paste("weather/", str_remove(paste(LL, collapse = ""),"-"), ".csv", sep = "")))
adamhsparks commented 3 years ago

Your apply statement runs over every line of the data.frame. That's 161 lines that it runs over. I don't see how that's only 7 times?

adamhsparks commented 3 years ago

because the file name does not exist...

PaulMelloy commented 3 years ago

Yes the apply function does run over every line of the data.frame. Not the most efficient, I know. but for each line it checks if there is a weather csv for the weather station that is closest to the trial. If the csv exists it will read it in with read.csv if not it downloads it from the web stores the data with file name of the lat and long of the station. given there are multiple sites with multiple years it makes sense to only download it the minimum number of times once per station closest to the trial (in some cases two sites might use the same station data).

adamhsparks commented 3 years ago
> LL <- PM_MB_dat[1, 21:22]
> paste("weather/", str_remove(paste(LL, collapse = ""),"-"), ".csv", sep = "")
[1] "weather/28.215152.1003.csv"

28.2150152.1003.csv does exist but not 28.215152.1003.csv, which is what it's looking for

adamhsparks commented 3 years ago

Yes, but it's much quicker/efficient to only look for the 15 unique lat/lon values to begin with.

PaulMelloy commented 3 years ago

I don't think the apply function is the problem here

adamhsparks commented 3 years ago

I didn't look at any other lat/lon values.

adamhsparks commented 3 years ago

I don't think so either. I'm just pointing out that I've found issues starting with the apply() function in how this was coded. I was going back to the beginning of where the issue starts. It starts there. That may not be what's causing the issue, but it is still inefficient and an issue that I've found.

PaulMelloy commented 3 years ago

Try

LL <- format(round(PM_MB_dat[1, 21:22],4,nsmall = 4))
paste("weather/", str_remove(paste(LL, collapse = ""),"-"), ".csv", sep = "")
adamhsparks commented 3 years ago
> LL <- format(round(PM_MB_dat[1, 21:22],4,nsmall = 4))
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'format': 3 arguments passed to 'round' which requires 1 or 2 arguments
> paste("weather/", str_remove(paste(LL, collapse = ""),"-"), ".csv", sep = "")
[1] "weather/28.215152.1003.csv"