Automatically determine cell-sizes to check while running bcd()

pramitghosh commented 4 years ago

Currently the argument l (cell side lengths to consider) to bcd() is set by default to

seq(10000, 100000, 10000): for self-similarity
matrix(rep(seq(10000, 100000, 10000), 2), ncol = 2): for self-affinity

However, these default values may lead to erroneous results depending on the data under consideration. A function that could automatically determine optimal cell-sizes for which the grid is to be calculated will be beneficial. This can be based on factors such as:

the geometric size of the feature (i.e. the length and width of its bounding box)
the length of the shortest edge (for *POLYGON geometries) or the shortest line segment (for *LINESTRING geometries)

There have been studies in the past to determine optimal cell-sizes while calculating box-counting dimension with published literature (mostly in statistical physics). Pointers to these sources will be highly appreciated if posted below.

pramitghosh commented 4 years ago

The following article gives certain considerations with regards to cell-size and offsets to calculate the box-counting dimension effectively.

Foroutan-pour, Kayhan, Pierre Dutilleul, and Donald L. Smith. "Advances in the implementation of the box-counting method of fractal dimension estimation." Applied mathematics and computation 105, no. 2-3 (1999): 195-210. https://doi.org/10.1016/S0096-3003(98)10096-6

Abstract:

The box-counting analysis is an appropriate method of fractal dimension estimation for images with or without self-similarity. However, this technique, including processing of the image and definition of the range of box sizes, requires a proper implementation to be effective in practice. The objectives of this study were thus (1) to determine how to prepare an image for box-counting analysis; (2) to define reasonable preferences for using the Fractal Dimension Calculator software; and (3) to develop a routine procedure for defining the most appropriate range of box sizes for any one-piece image. Four fractal images were chosen for this study: the Koch curve, Koch coastline, Koch boxes, and Cross-tree. Our results show that the skeletons provide better material for the box-counting method since only lines and/or curves are responsible for the fractal dimension value. In the procedure of box counting for fractal dimension estimation, the image must be surrounded by a four-square frame with the least possible area and the condition of linear relationship must be satisfied in a log–log plot. Fractal dimension is to be estimated over the minimum number of boxes covering the image for each box size, after superimposing a reasonable number of grid offsets. In many cases, 25% of the shorter image side may provide an appropriate value for largest box size. However, for noisy or dispersed patterns, a smaller box size than this is needed. In the log–log plot with 12 box sizes, some points corresponding to smaller box sizes deviate from the straight line from a certain point on. The box size corresponding to this breakpoint will provide an appropriate smallest box size. The exercise of determining the most appropriate range of box sizes must be performed repeatedly for every individual image.

pramitghosh commented 4 years ago

Reeve, R. (1992). A warning about standard errors when estimating the fractal dimension. Computers & Geosciences, 18(1), 89-91. https://doi.org/10.1016/0098-3004(92)90061-U

might also be interesting, particularly from a geoscientific perspective.

pramitghosh / sameSVD

Automatically determine cell-sizes to check while running bcd() #8