adding a README, docs, some example data, a vignette, and some notes

bw4sz commented 5 years ago

Hey Nikolai,

My work is in deep learning for airborne forest surveys (https://www.mdpi.com/2072-4292/11/11/1309, https://www.biorxiv.org/content/10.1101/790071v1, repo: https://github.com/weecology/DeepForest). We are building a benchmark dataset based on hand annotations for tree crown deliniation (https://github.com/weecology/NeonTreeEvaluation) and an R package to help compare future methods (https://github.com/weecology/NeonTreeEvaluation_package).

Carlos Silva suggested I try out your package to add to my tree segmentation benchmark.

I forked your repo, added a few things that may be useful. Feel free to ignore if i've interrupted your workflow.

I added sample .laz and .tif files for easy comparison.
Built a vignette from devtools::use_vignette() to show usage
Added lidR dependency because of that vignette. If you haven't seen lidR its rapidly becoming the default R package for LiDAR, its very very well done.

Overall, I tried to give the workflow a go, but I didn't get very reasonable results, with several hundred predicted trees overlapping. No doubt i'm not parameterizing it well. I welcome suggestions about which parameters might improve the segmentation. The code should be reproducible, see the vignette. Let me know where I went wrong.

niknap commented 5 years ago

Dear Ben, thanks for your interest in the package and for making a vignette. For now, I would like to avoid a dependence on lidR for the master branch, as I had some conflicts with my lidR and my other package slidaRtools in the past. Regarding the oversegmentation problem: I did not have much time to play with the parameters yet. In the future, I definitely would like to make the H2CW and H2CL parameters adaptive based on a presegmentation of the CHM (as done by Ferraz et al. in their paper). But I don't know yet, when I will get to implement this. The default H2CW of 0.3 is rather suited for the dense spruce trees in the Traunstein example. For single trees in an open landscape, like in your example, I would try a value close to 1. Also, there often remain single returns forming their own clusters. So after the segmentation I would discard clusters which consist of only few returns. I know that @melaineAK did more parameter testing than myself. Maybe she can share her experience. Best regards, Niko

bw4sz commented 5 years ago

@melaineAK and I had a meeting a few months back. I can email her too if her notifications aren't on.

system.time(clus.dt <- parallel_MeanShift(pc.list=lid.list, lib.path=.libPaths()[1], frac.cores=0.5, version="voxel",H2CW=0.8, H2CL=0.8, max.iter=40, buffer.width=10, minz=2, ctr.ac=2))

In terms of parameter names, I might refine crown "width" versus "length", i'm not wholly sire the difference (two sides of the bounding box?). If so, the definition of which is height and which is length is arbitrary depending on the perspective of image collection? I'll have to read more.

For the moment i'm not going to include this in the benchmark, but i'll mention it? Sure I can eliminate the small boxes, but overall its not obvious how best to optimize besides iterating through all parameter space. I think I would be giving it an unfairly low score. If you get interested, our benchmark data and R package is linked in that first issue.

niknap / MeanShiftR

adding a README, docs, some example data, a vignette, and some notes #2