r-lidar / lidR

Airborne LiDAR data manipulation and visualisation for forestry application
https://CRAN.R-project.org/package=lidR
GNU General Public License v3.0
582 stars 130 forks source link

Clipping a LAScatalog with corrupted LAZ and with LAX causes R to crash #730

Closed jfbourdon closed 8 months ago

jfbourdon commented 9 months ago

Derived from https://github.com/r-lidar/rlas/issues/66

Using lidR::clip_roi() on a LAScatalog refering to a corrupted LAZ results in R crashing if a corresponding LAX file is present. Before doing so, R prints several times the error message from LASlib like:

ERROR: 'end-of-file during chunk with index 21' after 1071430 of 7113804 points for 'C:\Temp\20_3625411f06_dc.laz'

If the LAX isn't present, the message above is only printed once and the call finishes normally without R crashing. So it seems that the cause of the crash is the lookup with the spatial index and not the corrupted LAZ itself. It should be noted that the LAX isn't corrupted; I tested with a version of the LAX made with the valid/complete LAZ and a version made with the corrupted LAZ and the crash occurred for both.

Jean-Romain commented 9 months ago

Did you try to perform a spatial query with lastool ? Does it crash ? If lidR crashes I'm pretty sure LAStools should crash too

las2las -i corrupted.laz -inside xmin ymin xmax ymax -o test.laz
jfbourdon commented 9 months ago

LAStools (230123) seems to handle the case of the corrupt LAZ and the LAX made with the valid LAZ, no crash per say as I get the the last line saying I might have a corrupt file.

las2las -i C:\Temp\20_3625411f06_dc.laz -inside 362556 5411042 362973 5411325 -o out_valid_LAX.laz
ERROR: 'end-of-file during chunk with index 21' after 1080313 of 7113804 points for 'C:\Temp\20_3625411f06_dc.laz'
ERROR: 'end-of-file during chunk with index 21' after 1080313 of 7113804 points for 'C:\Temp\20_3625411f06_dc.laz'
ERROR: 'end-of-file during chunk with index 21' after 1080313 of 7113804 points for 'C:\Temp\20_3625411f06_dc.laz'
ERROR: 'end-of-file during chunk with index 21' after 1080313 of 7113804 points for 'C:\Temp\20_3625411f06_dc.laz'
ERROR: 'end-of-file during chunk with index 21' after 1080313 of 7113804 points for 'C:\Temp\20_3625411f06_dc.laz'
ERROR: 'end-of-file during chunk with index 21' after 1080313 of 7113804 points for 'C:\Temp\20_3625411f06_dc.laz'
ERROR: 'end-of-file during chunk with index 21' after 1080313 of 7113804 points for 'C:\Temp\20_3625411f06_dc.laz'
ERROR processing file 'C:\Temp\20_3625411f06_dc.laz'. maybe file is corrupt?

I tested the two other cases... but my OS seems to have recovered the bad sector as my file is now fully valid. Trying again the command above doesn't trigger this error anymore. Next time I encounter this kind of issue, I'll know right away what to test for and report back.

Jean-Romain commented 9 months ago

It is very specific and hard to reproduce. I'm not sure if I will dig into it. Only if you are able to give me reproducible data. Who knows.

jfbourdon commented 9 months ago

Link to a corrupt LAZ and LAX to test.

In order to corrupt the LAZ, I opened the original file with a hexadecimal editor (HxD in my case) and deleted thousands of bytes at the end of the file. The LAX has been made with the valid LAZ. Thus, it reproduce the case that I had where the LAZ became corrupt after the creation of the index.

Using LAStools and the LAX:

las2las -i corrupt.laz -inside 368224 5410206 368775 5410868 -o out.laz

ERROR processing file 'C:\Temp\corrupt.laz'. maybe file is corrupt?

Using lidR and the LAX:

ctg <- lidR::readLAScatalog("C:/Temp/corrupt.laz")
lidR::clip_rectangle(ctg, 368224, 5410206, 368775, 5410868)  # Crash!

Reading the LAZ with lidR::readLAS() display the messages without crashing:

lidR::readLAS("C:/Temp/corrupt.laz")
WARNING: 'chunk table and bytes are missing. LAZ file truncated during copy or transfer?' for 'C:/Temp/corrupt.laz'
ERROR: 'end-of-file during chunk with index 30' after 1545355 of 7988730 points for C:/Temp/corrupt.laz'
Jean-Romain commented 8 months ago

I just tested, it comes from very deep in LASzip. las2las crashed similarly on my computer. las2las was compiled with g++ on linux while you are likely using las2las for windows compiled with MSVC. But in rlas it is compiled with g++ too. This may explain different behaviors and how undefined behaviors, behave.

While the overall problem is easy to understand, there is nothing I can do. This is beyond what I can/want to do. I'm closing.

Jean-Romain commented 8 months ago

Please make me a reproducible error with a las file. This way it should fail but not reach some laszip stuff. The error stack will be less complex

jfbourdon commented 8 months ago

LAS and LAX for testing: https://transfert.mern.gouv.qc.ca/?ShareToken=BEDD7018B4452E6BE933275B029CB0E7B65B47FB

Using the exemple file Megaplot.laz provided with lidR, converting it to LAS, generating a LAX and then corrupting it by deleting thousands of bytes at the end of the file, I now get into what seems an infinite loop/hang with the following message being repeated over and over: ERROR: 'end-of-file' after 64639 of 81590 points for 'C:/Temp/Megaplot_corrupt.las'

Using lidR or las2las gives the same result. If the LAX isn't present, the process doesn't hang and simply finishes with the error message.

ctg <- lidR::readLAScatalog("C:/Temp/Megaplot_corrupt.las")
lidR::clip_rectangle(ctg, 684767, 684993, 5017773, 5017780)  # Hangs
las2las -i C:/Temp/Megaplot_corrupt.las -inside 684767 684993 5017773 5017780 -o out.laz  # Hangs
Jean-Romain commented 8 months ago

So I won't fix it.