tmontaigu / pylas

⚠️ pylas was merged into laspy 2.0 https://github.com/laspy/laspy⚠️
BSD 3-Clause "New" or "Revised" License
39 stars 13 forks source link

Truncating return numbers #40

Closed excalamus closed 3 years ago

excalamus commented 3 years ago

I am comparing pylas[lazr] to laszip64 at converting las to laz and seeing a 30% speed bump with lazr 🎉. If I understand correctly, it should be as simple as:

las = pylas.read(las_input)
las.write(laz_output)

However, I get the following warning:

WARNING:pylas.headers.rawheader:Received return numbers up to 6, truncating to 5 for header.

I'm not sure what this means. I have tried grepping through all the sources and can't find the source.

Any insight? Is my data being truncated?

excalamus commented 3 years ago

I see from the v1.2 spec that "Return Numbers" are:

...The pulse return number for a given output pulse. A given output laser pulse can have many returns, and they must be marked in sequence of return. The first return will have a Return Number of one, the second a Return Number of two, and so on up to five returns.

I assume that means my data is incorrectly stored.

tmontaigu commented 3 years ago

The return number field is stored on 3 bits so the values can only be in range [0, 7] which is 8 values but the header field number_of_points_by_return can only store 5 values in files with version < 1.4

Actually the first return is '1' so 0 should not be considered, but I don't think pylas 0.4.3 does that, so the data is not truncated but, the header value for number_of_points_return may be wrong 🤔

SuaveFool commented 3 years ago

@excalamus You may be converting point data format in the process, point data formats' 6-10 use 4 bits (i.e. values 0-8) for encoding return number, where formats 1-5 only have 3 bits. Though as tmontaigu has mentioned this wouldn't be relevant if you have an older LAS standard (i.e. < 1.3).

excalamus commented 3 years ago

I asked the people in collections and it sounds like my company doesn't make use of the return number anyway. So, that answers that. However, here's what I think is going on. The lidar is collected in the manufacturer's proprietary format and converted to las v1.2 using the manufacturer's proprietary tool. Examining the las file with laszip, I'm seeing both number of returns and return numbers greater than 5. So, it looks to me like the proprietary tool is not following the spec (albeit preserving data). Everyone I work with uses laszip to convert from las to laz. In my experience, the output of laszip isn't great; I wondered if laszip was conforming to the spec (and truncating) when compressing and simply not informing users. Running laszip -v -i "lidar.las" -o "lidar.laz" and then running laszip -i "lidar.laz" -o "lidar.txt" -oparse r to get the return numbers and... there are return numbers > 5. I conclude that the manufacturer doesn't obey the spec and that laszip faithfully compresses whatever it's given (regardless of whether the input conforms to the spec).

tmontaigu commented 3 years ago

And pylas also does not truncate the actual data, its just the header field for which we truncate date because there is no other choice possible

excalamus commented 3 years ago

Ah, okay. That makes sense. Thank you for working with me on this!