hobuinc / usgs-lidar

AWS Entwine Point Tiles USGS LiDAR Public Dataset GitHub repo
https://registry.opendata.aws/usgs-lidar/
137 stars 14 forks source link

Potentially missing or bad data in sections of some datasets #49

Open mattbeckley opened 1 year ago

mattbeckley commented 1 year ago

There appears to be something wrong with some of the underlying laz files or sections for the dataset:

FL_HurricaneMichael_6_2020

Attempting to get classification info on the dataset with a command like:

pdal info --stats --filters.stats.dimensions=Classification --filters.stats.count=Classification ept://https://s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_HurricaneMichael_6_2020 --readers.ept.resolution=50

you get:

Exception in pool task: Could not read from s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_HurricaneMichael_6_2020/ept-data/4-2-12-7.laz Exception in pool task: Could not read from s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_HurricaneMichael_6_2020/ept-data/4-3-5-7.laz Exception in pool task: Could not read from s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_HurricaneMichael_6_2020/ept-data/4-3-6-7.laz Exception in pool task: Could not read from s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_HurricaneMichael_6_2020/ept-data/4-3-7-7.laz

And the command does not complete. This type of command works fine on most other 3dep datasets. Also, running several simple spot checks for data subsetting, I get errors similar to: Exception in pool task: Could not read from s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_HurricaneMichael_6_2020/ept-data/4-6-10-7.laz Exception in pool task: Could not read from s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_HurricaneMichael_6_2020/ept-data/5-13-20-15.laz Exception in pool task: Could not read from s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_HurricaneMichael_6_2020/ept-data/7-52-82-63.laz Exception in pool task: Could not read from s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_HurricaneMichael_6_2020/ept-data/6-26-41-31.laz

Note the datasets below have similar issues:

  1. FL_Peninsular_FDEM_Alachua_2018 - Classification query fails, however almost all subsetting spot checks returned data
  2. FL_Peninsular_FDEM_Polk_2018 - Classification query fails, and several subsetting spot checks failed (no data returned).
  3. USGS_LPC_TX_RedRiver_BrazosBasin_B3_2017_LAS_2019 - Classification query fails, and MANY subsetting spot checks fail (no data returned)

It appears that for these datasets, there maybe some sections that are corrupt, where other sections of the dataset are fine??

keythread commented 1 year ago

FYI - I reprocessed these hoping that it might fix the problem with limited success.

USGS_LPC_TX_RedRiver_BrazosBasin_B3_2017_LAS_2019 does seem to work now.

But FL_HurricaneMichael_6_2020, FL_Peninsular_FDEM_Alachua_2018 and FL_Peninsular_FDEM_Polk_2018 still have issues.

ghost commented 1 year ago

I'm seeing a similar issue with CA_UpperSouthAmerican_Eldorado_2019. Error is: PDAL: readers.ept: Error reading tile: Could not read from s3-us-west-2.amazonaws.com/usgs-lidar-public/CA_UpperSouthAmerican_Eldorado_2019/ept-data/11-1453-1750-1027.laz

keythread commented 1 year ago

Started reprocessing of the CA_UpperSouthAmerican_Eldorado_2019 dataset. Should be done processing by 12/19 and I will then push back to the public dataset.

keythread commented 1 year ago

The reprocessing of CA_UpperSouthAmerican_Eldorado_2019 is complete and transferred to the public dataset. This appears to have fixed the original problem.

hobu commented 1 year ago

@keythread IA_Eastern_1_2019 needs to be repushed

readers.ept: Error reading tile: Could not read from s3-us-west-2.amazonaws.com/usgs-lidar-public/IA_Eastern_1_2019/ept-data/10-481-658-509.laz
hobu commented 1 year ago

@keythread check that. I've done some more investigation into these issues, and we think S3 is throttling us.

keythread commented 1 year ago

@hobu thanks for the update!

kjwaters commented 1 year ago

It seems like there are still issues. I'm seeing a failure to get http://s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_Peninsular_FDEM_Polk_2018/ept-data/11-930-1191-1010.laz If I try to simply retrieve that file, I get an immediate response that says: Error> NoSuchKey

The specified key does not exist. FL_Peninsular_FDEM_Polk_2018/ept-data/11-930-1191-1010.laz 4YE1VHR44R28H7EJ MazdxT2wmSI83CUws/It17Zf7vf8tBo92Nu00wVmRrlUPiLgTVrRtTluXFyDT8Hm+YXk6VPwTNY=

That doesn't look like a throttling issue.

hobu commented 1 year ago

@keythread FL_Peninsular_FDEM_Polk_2018 needs a repush.

keythread commented 1 year ago

Appreciate these reports! Started the reprocess of this WorkUnit since my copy has already been deleted. I haven't identified the cause. The sync logs for this transfer show it completed without errors but I don't see a log entry for that particular file which means it was not in the Entwined version. The entwine process did complete successfully but my logs also indicate it wrote less points than originally determined with the INFO command. I'll report back when the re-process and push completes.

keythread commented 1 year ago

Reprocessing completed this week. Wrote 140,647,745,244 points out of 140,849,384,067 reported by Entwine ‘info’ command which is still a discrepancy but the reported missing file is present in this re-build:

http://s3-us-west-2.amazonaws.com/dlv-research/entwine/FL_Peninsular_FDEM_Polk_2018/ept-data/11-930-1191-1010.laz

So I initiated the transfer to the public dataset this afternoon and it should be completed by this time tomorrow.

kjwaters commented 1 year ago

@keythread Thanks for updating! I hate to mention it, but I'm now seeing failures on another of the Florida Peninsular work extents. For example, http://s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_Peninsular_FDEM_Alachua_2018/ept-data/10-546-521-504.laz is missing, but so are a lot more. If I look at that data set in the potree viewer (https://usgs.entwine.io/data/view.html?r=%22https://s3-us-west-2.amazonaws.com/usgs-lidar-public/FL_Peninsular_FDEM_Alachua_2018%22), it looks fine when zoomed out, but when you zoom in to some areas you see that the detail is missing. The area southeast of Gainesville is one area I noticed. Some areas were fine though.

keythread commented 1 year ago

@kjwaters Started reprocessing of FL_Peninsular_FDEM_Alachua_2018. It also indicated the previous build saved less points than the Info command found.

keythread commented 1 year ago

Reprocessed and pushed to the public dataset the FL_Peninsular_FDEM_Alachua_2018 WorkUnit. Please review if you can to see if it is complete. The reprocessing did report the correct total of points written that were identified in the Entwine Info command and the missing tile referenced above is now present.