Closed jstrunk001 closed 2 months ago
Please explain what was the original error. How a GUID could be invalid? It could be zeroed but I see no scenario where a GUID could prevent reading a file. Please explain me how your code solve the problem. I see a lot of code reshaping but can't grasp where is the actual fix especially without understanding what was the issue.
There is no way this change could be correct
#if (use_wktcs(header))
if (T)
PHB[["CRS"]] <- wkt(header)
else
PHB[["CRS"]] <- epsg(header)
There is no way this change could be correct
#if (use_wktcs(header)) if (T) PHB[["CRS"]] <- wkt(header) else PHB[["CRS"]] <- epsg(header)
You are correct, this is not part of the suggested change above! I meant to revert this back to the original code. I used this to debug when I didn't have the lidR package loaded.
Please explain what was the original error. How a GUID could be invalid? It could be zeroed but I see no scenario where a GUID could prevent reading a file. Please explain me how your code solve the problem. I see a lot of code reshaping but can't grasp where is the actual fix especially without understanding what was the issue.
Here is the error:
The issue is how the catalog function handles corrupt .laz or .las files. For example, I have laz files for the entire state of Oregon. There is a single laz file that cannot be read by rlas, "header <- rlas:::lasheaderreader(x)". This means that after 10 minutes of scanning headers, the lidR::catalog() function fails with an error. lidR::catalog() does not have an option to bypass failed laz reads.
To fix this error, I encapsulate the lidar header reading / parsing code into a separate function. I initially only wrapped the header reading code in a try statement: code: 'header <- try(rlas:::lasheaderreader(x))' , but I found that there were cases when the header read was successful, but somehow the other parts of the header parsing code failed. Therefore, I simply placed the entire chunk of header reading / parsing code into an external function. I then use mapply across all input files to read all of the individual laz/las files with your existing header reading / parsing code - but encapsulated in a function. If it is in a separate function, we can wrap the call in a try() statement.
.fn_one_header = function(x,phblab){
header <- rlas:::lasheaderreader(x)
header <- LASheader(header)
PHB <- header@PHB
names(PHB) <- phblab
#if (use_wktcs(header))
if (T)
PHB[["CRS"]] <- wkt(header)
else
PHB[["CRS"]] <- epsg(header)
# Compatibility with rlas 1.3.0
if (!is.null(PHB[["Number.of.points.by.return"]])) {
PHB[["Number.of.1st.return"]] <- PHB[["Number.of.points.by.return"]][1]
PHB[["Number.of.2nd.return"]] <- PHB[["Number.of.points.by.return"]][2]
PHB[["Number.of.3rd.return"]] <- PHB[["Number.of.points.by.return"]][3]
PHB[["Number.of.4th.return"]] <- PHB[["Number.of.points.by.return"]][4]
PHB[["Number.of.5th.return"]] <- PHB[["Number.of.points.by.return"]][5]
PHB[["Number.of.points.by.return"]] <- NULL
PHB[["Global.Encoding"]] <- NULL
}
PHB$filename = x
return(PHB)
}
I made a separate function that wraps the header reading call in a try() statement. I handle the progress bar at the same time.
.fn_progress_headers=function(progress,pb,i,N,t0,...){
#update progress bar
if (progress && (Sys.time() - t0 > getOption("lidR.progress.delay"))) {
utils::setTxtProgressBar(pb, i)
}
#run one file - fail gracefully
try(.fn_one_header(...), silent=T)
}
I see. Can you please share the problematic file?
u please share the problematic file?
Do you have a place to drop this 2 gb file?
I see. Can you please share the problematic file?
although I think it is important to build in some error handling into the catalog function to have the capacity to skip bad files
Thank you for pointing out the issue. I initially missed the actual error because it was sent as a screenshot rather than copied text.
readLAScatalog
, I’m not entirely convinced it should automatically skip invalid files. Since the file is corrupted, it should ideally be removed or repaired rather than skipped. However, I understand there are pros and cons to this approach. I'll try to reproduce the issue on my end and assess the best course of action.Problem addressed in 39d1f19
Great!!
On Mon, Sep 9, 2024, 5:10 AM Jean-Romain @.***> wrote:
Problem addressed in 39d1f19 https://github.com/r-lidar/lidR/commit/39d1f196b22fa8ee56dcbe73dbd337102df54ffd
— Reply to this email directly, view it on GitHub https://github.com/r-lidar/lidR/pull/776#issuecomment-2337953112, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGIEGC4UB2UYOBBH7KX23ZTZVWF4LAVCNFSM6AAAAABNTBTTA6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZXHE2TGMJRGI . You are receiving this because you authored the thread.Message ID: @.***>
This modification to io_readLAScatalog.R allows "catalog" to handle corrupted .laz and .laz files with bad GUID values. The bad records are simply eliminated from the catalog. This is handled by pushing file handling to a couple of separate function. I am not sure if any of this code is compatible with the current lidR coding standards, but I personally use something like this as a replacement for the catalog() function in the lidR package because I often run into corrupt laz files...
The first suggested function is .fn_progress_headers() and handles progress bar updates and wraps the process of reading headers in a try() statement. This way, if reading a header fails, the catalog() function can simply carry on.
The second suggested function is basically just a functionized extraction of code that JR already has in the main script
There are some further revisions in the body of readLAScatalog to modify how the progress bar is initiated and to handle the return of bad las or laz files including to issue a warning about the failed files.