Closed ebo closed 6 years ago
... so I forked the repository and figured that I would send pull requests from here. I can change this workflow if you really want, but I figure that is safer.
Seems like a good workflow to me!
I'm having a look through the notebooks now. Thanks for clearing landsat_spectral_clustering_xa.ipynb
in the PR: I have removed the commit from master to avoid bloating the repo (and I have a copy of the notebook with output locally for reference).
Thanks @ebo !
I just ran though things. Here are my initial observations. Hopefully these are the same problems that you want fixed :)
arr
might become much much larger in the future.arr
this number will continue well beyond these limits, which is concerning.Is this correct? Are there other concerns here that I'm missing?
I have so many different little test notebooks laying around in attempts to figure this out I cannot keep them all straight, so I will answer your question in general.
I have been handed images as large as 34,000 x 245,000 pixels with 8 16-bit bands that I will need to process. There is no way that I will be able to read them completely into memory and process them on any of the VM's I realistically have access to. In broad strokes these images have the same structure as Landsat images, and I was trying to come up with publicly distributable examples that we can all work through. In addition, to compare dask/xarray/rasterio and friends to previous versions of the machine learning code, I need to limit the memory footprint to 4GB of RAM and 1 or two threads/core (regardless of the actual size of the VM). So regardless of the size of the example, assume that any example other than a unit/regression test will have to scale to images 10 to 100 times the size.
Also, I did not realize that I was the one that needed to confirm the pull request. I have several new examples merged now.
I was working under a branch of pyvis-topics/EarthML and something was not looking right so I forked the repository and figured that I would send pull requests from here. I can change this workflow if you really want, but I figure that is safer.
I modified a couple of older examples which transposed the input data so that rasterio, xarray, dask-ml and holoview process and display the data in a reasonable way. I wanted to post the current state of the code before leaving for the day.
I also started setting up for the end-to-end example of replicating a study of lake volume change that was recently published in Nature Geosciences. This is only a start, but the intention is to replicate the initial image processing of the study.