locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
240 stars 46 forks source link

CPLE_OpenFailed(4) "Open failed." /vsihdfs/hdfs://[ip]:9000/....tif: No such file or directory #549

Closed JenniferYingyiWu2020 closed 3 years ago

JenniferYingyiWu2020 commented 3 years ago

Hi, I have tried to execute the command of "spark.read.raster('/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif')" and "rf.select(rf_crs("proj_raster").alias("value")).first()", however the below errors appear: [1 of 1000] FAILURE(3) CPLE_OpenFailed(4) "Open failed." /vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif: No such file or directory [2 of 1000] FAILURE(3) CPLE_OpenFailed(4) "Open failed." /vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif: No such file or directory 21/03/24 09:26:38 ERROR Executor: Exception in task 61.0 in stage 9.0 (TID 201) java.lang.IllegalArgumentException: Error fetching data for one of: GDALRasterSource(/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif)

Caused by: geotrellis.raster.gdal.MalformedDataException: Unable to construct a RasterExtent from the Transformation given. GDAL Error Code: 4

    I have to mention that before I executed the above commands I have built a Hadoop cluster on 192.168.101.201, 192.168.101.202 and 192.168.101.203. Among them, 192.168.101.201 is a master, 192.168.101.202 and 192.168.101.203 are workers. Moreover, I have installed GDAL and RasterFrames environment on the server of 192.168.101.201, 192.168.101.202 and 192.168.101.203. Besides, the dataset have been uploaded to "hdfs://192.168.101.201:9000".

(base) hduser_@jenniferwu-OptiPlex-7070:~$ hdfs dfs -ls hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif -rw-r--r-- 1 geotrellis supergroup 6712825 2021-03-23 14:17 hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif Lastly, my python codes to set HADOOP_USER is the followings:

from pyrasterframes.rasterfunctions import * from pyrasterframes.utils import create_rf_spark_session import os.path HADOOP_USER = 'geotrellis' os.environ["HADOOP_USER_NAME"] = HADOOP_USER spark = create_rf_spark_session(**{ 'HADOOP_USER_NAME': HADOOP_USER }) So, could you pls help to give me some suggestions on how to resolve the errors of "/vsihdfs/hdfs://192.168.101.201:9000/Jennifer_hadoop/Yunyao_Data_Set/split_20200613clip/B1.tif: No such file or directory"?

metasim commented 3 years ago

It's hard to do anything with this without a repeatable test case. If you are able to create an integration test or some other automated mechanism to reproduce this, reopen this and associate with a PR exemplifying the problem.

JenniferYingyiWu2020 commented 3 years ago

Hi @metasim, I have resolve the above issue. Thanks also!