Open Mmoncadaisla opened 5 years ago
I was helping out too and things we tried:
1) windows server 2016 ec2 instance without gdal, with conda environment. Able to read a local file at 'file:///C:/foo/bar.tif;
2) try with gdal:/
(as in pastebin example)
3) try uri as file:///C:/foo/bar
etc
It seems that it may be down to the GDAL reader with windows local paths. the jvm reader seemed to work as in item 1 above. It may also be the ENVI format itself.
This one is fairly difficult to debug also just due to my dev environment being mac / linux.
I also somewhat wonder if the issue may be in the GT code?
Okay here is my hypothesis and proposed fix(es).
If the user is on Windows and pointing at a local file in such a fashion that would use the GDAL reader, this code will be used to parse the URI string. The comment here claims that VSIPath doesn't like single slash file:/path
so removes it. But the file:/
(single slash) seems to be exactly what is needed for correct extraction of windows file path from the WINDOWS_LOCAL_PATH_PATTERN
regex here.
Furthermore if we remove the file:/
from a windows URI the VSIPath's SCHEME_PATTERN
regex is going to incorrectly interpret the drive letter as the scheme.
My proposed fixes:
Remove the tweaked
logic entirely from GDALRasterSource
. That would allow the user to pass in file:/C:\Foo\Bar\file.dat
which will result in VSIPath().vsiPath
equal to C:\Foo\Bar\file.dat
Change to geotrellis contrib and RF:
Change the WINDOWS_LOCAL_PATH_PATTERN
to (?<=(?:(\/){2})).+
to remove the scheme correctly.
change the tweaked
logic from the GDALRasterSource
. Instead of removing file:/
entirely, replace it with the double slash so: file:/C:\Foo\Bar\file.dat
-> file://C:\Foo\Bar\file.dat
@metasim who on the GT side do we need to engage?
@MiguelNOX I published a snapshot / dev of the branch here built as a whl to the test pypi instance. Can you try installing in your environment?
pip install --extra-index-url https://test.pypi.org/simple/ pyrasterframes==0.8.2.dev0
And let us know if it works with specifying a single slash path like so: spark.read.raster(r'file:/D:\path\to\raster')
@vpipkt After installing it in my environment and removing the older version (had to necesarily remove it in order to let it run properly) i get the next error:
Thank you again for your dedication, i hope we can solve this!
Ok I think we now have isolated the following:
1) changes to URI string interpretation discussed above 2) improved discussion of windows gdal installation in the documentation;
Rationale for 2 is seen in latest attempts by @MiguelNOX to read envi file. Python session has access to GDAL, and attempts to read envi file are handled by the JVM GeoTiff reader, meaning (probably) that the underlying call to GDALWarp.get_version_info
has failed and gal is not available.
Trying to get an output of the same script from @MiguelNOX where we also see the output from pyrasterframes.utils.gdal_version
which would be more definitive.
You are totally right, when i try to get the output from pyrasterframes.utils.gdal_version
i get not available
as output. in my local computer.
EDIT:
I've just tried using pyrasterframes through google colab on this tif file from the RF's doc page: B02.tif
I've succeeded at reading the file with just gdal through the colab notebook. I have also tested spark on it and it works properly. However, i still get trouble reading the file through pyrasterframes as shown below in the pastebin.
In this case, when running from pyrasterframes.utils import gdal_version print(gdal_version())
i get GDAL 2.2.3, released 2017/11/20GDAL 2.2.3, released 2017/11/20
as output.
Well that is good to know. Hmm @metasim is there someone that @MiguelNOX could reach out to for help with installing the gdalwarp bindings correctly in Windows?
On Tue, Sep 24, 2019 at 3:49 AM MiguelNOX notifications@github.com wrote:
You are totally right, when i try to get the output from pyrasterframes.utils.gdal_version i get not available as output.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/locationtech/rasterframes/issues/356?email_source=notifications&email_token=AB3P4L4SZJVQXJCWFKU3SOLQLHBB3A5CNFSM4IYCRA6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7NNZ3Y#issuecomment-534437103, or mute the thread https://github.com/notifications/unsubscribe-auth/AB3P4L24O6NIBFBAYNBG5P3QLHBB3ANCNFSM4IYCRA6A .
@MiguelNOX You could try the GeoTrellis Gitter channel. On Windows it's often that the GDAL shared libraries aren't in the PATH
variable.
For reference a short recap on how to read envi files located on a Windows network share, via gdal (2.4.2), in pyrasterframes on macOS (10.15).
Currently on macOS you really need gdal installed via Homebrew. It does not work with gdal installed with anaconda (and probably other ways either).
Also, for envi files gdal wants the .dat, or .bil file, not the .hdr one.
In our company many users make use of Microsoft Windows and there are standard folder shares. On macOS if you mount those they end up under /Volumes/. With that done it appears to be enough for gdal to reference them as 'file:///Volumes/
Here is a little (python) example that works in my case:
import pyrasterframes
from pyrasterframes.rasterfunctions import *
from pyrasterframes.utils import create_rf_spark_session
spark = create_rf_spark_session(**{
'spark.driver.extraJavaOptions': '-Djava.library.path=/Users/.../opt/anaconda3/lib'
})
file = 'file:///Volumes/dfs-root/.../ndvi/2019/ndvi20190717_csa_10m.dat'
# file = 'file:///Volumes/dfs-root/.../Sentinel2/2019/20190717/S2B_L2A_20190717_B01.bil'
df = spark.read.raster(file)
Passing the anaconda lib path to spark might not strictly be needed anymore. @metasim and @vpipkt probably know :-)
@robknapen thanks very much for that. And @Mmoncadaisla take a look and see if this is of assistance for you.
Hello @robknapen thank you very much for your comments, i am now running out of time with a project but i will check it out asap and leave the feedback here.
However, i have also installed Ubuntu 18.04 in another pc so that i am able to run pyrasterframes without any trouble in case your solution wouldn't work out for me.
Thank you very much @vpipkt as well for your dedication
I have an error, too. I read a Landsat 8 image file, then created a catalog and displayed the following: landsat=[r'D:\Graduation_thesis\data\LC08_L1TP_014032_20190720_20190731_01_T1\LC08_L1TP_014032_20190720_20190731_01T1{b}.TIF' for b in bands] catalog = ','.join(bands) + '\n' + ','.join(landsat) df = (spark.read.raster(catalog, bands) display(df) When I run this code, I got a very long error, Focus is: Caused by: java.lang.IllegalArgumentException: Illegal character in opaque part at index 2: D:\Graduation_thesis\data\LC08_L1TP_014032_20190720_20190731_01_T1\LC08_L1TP_014032_20190720_20190731_01T1{b}.TIF Please help me. Many thanks
@tieuthienvn1987 Should r'D:\Graduation_thesis\data\...
be f'D:\Graduation_thesis\data\...
?
@tieuthienvn1987 Should
r'D:\Graduation_thesis\data\...
bef'D:\Graduation_thesis\data\...
? You mean f'D:\Graduation_thesis\data......? I used r'D:\Graduation_thesis\data...' but when I run code, I got an error
Hello,
I've succeded trying to read the data from the examples on the RFs doc site (ie: https://rasterframes.io/raster-read.html). However, i'm uncapable of reading the same data when located in my local computer.
I am using a Windows 10 OS, python 3.7 with anaconda distribution and jupyter notebook.
I've tryed out different ways of typing the uri without success. I hope the community can help me out since i'm totally stuck on the basics here.
Furthermore, i've succeded reading the data with GDAL (w/o pyspark).
Code here: https://pastebin.com/ijiMfhwU