interpret_waveform scale factor is too low

floriandeboissieu commented 2 years ago

Hi Jean-Romain, I am back on lidar projects ;-).

In interpret_waveform, scale factor is fixed at 1e-6, which is quite limiting on coordinates values and led me to a quantization error. Could it be passed as an argument instead (as well as offset)? Cheers

Jean-Romain commented 2 years ago

What a surprise to see you back!

I'll look at that. I think we should estimate somehow a value instead of hard coding one. The formula is:

P$X + P$WDPLocation * P$Xt

So we must figure out what is the accuracy of each element.

floriandeboissieu commented 2 years ago

It could be a good solution to have an estimate from the range values, but maybe the most straight forward would be using the scale and offset of the input las, no?

Jean-Romain commented 2 years ago

rlas::header_create should automatically estimate the adequate accuracy. I think hard coding the scale factor is not necessary. What is the accuracy automatically detected in your dataset?

header <- as.list(las@header)
data <- las@data
fwf <- rlas::fwf_interpreter(header, data)
fwf <- data.table::rbindlist(fwf)
fwfheader <- rlas::header_create(fwf)
fwfheader["X scale factor"]

Jean-Romain commented 2 years ago

Well, it will find 1e-8 which is the smallest value that rlas can return

Jean-Romain commented 2 years ago

It is not an easy question actually. With a smaller scale the range of storable value is reduced. with 1e-7 we can only store value on a range of 429 m

storable_coordinate_range(1e-7, 0)
#>       min       max 
#> -214.7484  214.7484

With 1e-8 it becomes 43 m

storable_coordinate_range(1e-8, 0)
#>       min       max 
#> -21.47484  21.47484

Jean-Romain commented 2 years ago

Or maybe the problem is that your dataset is too large to be stored with the default 1e-6. Please clarify what is the error you encountered

floriandeboissieu commented 2 years ago

Yes, the problem I had was on a large area (>5000m) around 0,0,0.

But from what I understood, the quantization error would also have happened if I had been working with real coordinates in a standard french CRS for example (epsg:2154 has parameters x_0=700000 +y_0=6600000), and it that case it would also have been due to the offset definition (fixed to 0).

I would say that using the input las scaling would be a good default and intuitive solution, as the waveform is supposed to be around the returns and in most cases not so far away (e.g. maximum a few km starting from the plane). For the cases in which it does not fit, allowing the user to change scale and offset seems to me a good option for advanced user to adapt as needed, especially if the message can be a little more explicit.

I tried rlas:::guess_scale_factor which gave 1e-8 so I understood it may have been designed to guess the factor when reading the data from a LAS file and not the opposite way. Anyway, the auto-adapting solution would also be good but I would implement it using the center (max-min)/2 as the offset and the range (max-min) to fix the smallest scale it would fit in. The drawback of this solution is that it may lead to different offsets and scales between coordinate axis, tiles or different sizes of the region of interest.

Last, the fixed scale of 1e-6 would mean a precision of 1e-6 (usually meters) is relevant in the data. From my point of view, in many cases (even in TLS) 1e-4 m is sufficient (1 ns sampling --> 0.3m) compared to the uncertainty of sensor measurements. But I suppose it depends on the applications and you may have chosen that value for a reason.

Jean-Romain commented 2 years ago

I would say that using the input las scaling would be a good default and intuitive solution

I think the original scaling is unappropriated. It is often 1 cm accuracy. I think with a nanosecond resolution we need at least a mm resolution. Let do the math.

From my point of view, in many cases (even in TLS) 1e-4 m is sufficient (1 ns sampling --> 3e-4m) compared to the uncertainty of sensor measurements.

In the example dataset with one nanosecond resolution and an incidence angle of 5 degrees we have a distance between two points of 3.62 cm. So with a accuracy of 1 mm (1e-3) it should be good. And with 1e-4 it is way enough.

f <- system.file("extdata", "fwf.laz", package="rlas")
head <- rlas::read.lasheader(f)
data <- rlas::read.las(f)
fwf <- rlas::fwf_interpreter(head, data[1,])

X = fwf[[1]]$X
Y = fwf[[1]]$Y
n = nrow(fwf[[1]])

d = sqrt((X[-n] - X[-1])^2+(Y[-n] - Y[-1])^2)
unique(d)   
#> [1] 0.03629021

A light-nanosecond is approx 300 millimeters. With an angle of 5 degrees the distance between two consecutive measurments is 300*sin(5*pi/180) which equate approximately 3 cm. It is consistent with the measurement. To be able to distinguish two consecutive measurement at 1 degrees we need 300*sin(1*pi/180) which equate 5 mm. So a resolution of 1 mm is also sufficient.

As a conclusion I'd say that 1e-4 as you suggested is good enough. It allows for 430 km of storage.

I tried rlas:::guess_scale_factor which gave 1e-8 so I understood it may have been designed to guess the factor when reading the data from a LAS file and not the opposite way.

It estimates (statistically) the accuracy of the numbers in a vectors. But Xt is given with many digits so X + WDPLocation * Xt is also computed with many digits. This accuracy is irrelevant and not wanted.

floriandeboissieu commented 2 years ago

I agree with your conclusions, thanks for your fast replies as usual. Fixing the minimum point as the offset and a scale at 1e-4, the maximum range would actually be around 214km.

But from what I understood, the quantization error would also have happened if I had been working with real coordinates in a standard french CRS for example (epsg:2154 has parameters x_0=700000 +y_0=6600000), and it that case it would also have been due to the offset definition (fixed to 0).

Just a correction of this previous comment: offsets are set to the minimum coordinates in rlas::header_create, so working in real coordinates should not cause any problem.

Jean-Romain commented 2 years ago

You can look at the commit. I'm doing the math based on a limit of 1 degree off nadir and based on the temporal spacing. I'm querying the closest reasonable and valid scale factor and I'm dividing by 10 to get extra accuracy. It should return something close to 1e-4 for most temporal spacing values.

If a dataset has a 10 ps temporal spacing it uses 1e-6. For 100 ps it uses 5e-5. For 2000 ps which is the value of my example file it uses 1e-3 i.e. a millimiter of accuracy. For 1000 ps it uses 5e-4 i.e. half a millimeters of accuracy (2150 km of storable coordinates)

floriandeboissieu commented 2 years ago

Perfect, many thanks!

r-lidar / lidR

interpret_waveform scale factor is too low #549