Closed cpiponiot closed 4 years ago
I think that the example in the function description file is sufficient; more important to get this done than to perfect it. :-)
telling the user that they might need to parallelize if they have a large data set is fine with me. I ended up spiting my data into 10 chunks and running the function separately for each of them.
I'm not very sure but maybe one of the problem with memory is related with that large raster layer (koppenRaster) that is part of the get_biomass function. Why do we need that if we can get koppen zones using the R package?
I can try using the R package instead, to see if it improves it, but I think the main problem is the weight matrix (500 equations * N observations)
It is much faster now, last night I couln't run test 3 from here and now it only took 1-2 minutes
I think there may be something broken now?
I get this issue:
Error in equation_id %in% equations_ids : object 'equation_id' not found
never mind, I restarted my session and now it works...
I added an example of this in the get_biomass()
description file
@ValentineHerr I think you already mentioned this but now with the size of the equation table (550 equations), there can be some memory problems when running the get_biomass function for very large datasets (appr. > 20 000 observations on my computer). I've tried to optimize the function as much as possible but we need to keep a lot of information until the final calculation (all the weights, etc), so the only way I see around this issue right now is to parallelize the agb calculation. This can be done outside of the function by the user, or within the function. It will take a little more time to implement and this isn't a top priority if we want to have something ready by September, but I wanted to check with you @gonzalezeb @teixeirak @ValentineHerr if you think it could be useful at all. Otherwise we can just add an example of how to parallelize the calculation in the function description file.