locationtech / rasterframes

Geospatial Raster support for Spark DataFrames
http://rasterframes.io
Apache License 2.0
243 stars 46 forks source link

Snapshot wheel naming interferes with pip install #459

Closed jpolchlo closed 3 years ago

jpolchlo commented 4 years ago

On some platforms (observed this on EMR), the naming scheme of snapshot wheels causes an error during pip install:

[hadoop@ip-172-31-49-55 tmp]$ pip3.6 install --user /opt/rasterframes/pyrasterframes-0.8.5-SNAPSHOT-py3-none-any.whl 
ERROR: pyrasterframes-0.8.5-SNAPSHOT-py3-none-any.whl is not a supported wheel on this platform.

After some digging, it appears that the snapshot tag is being recognized as part of the platform (snapshot-py3). Might want to consider not using a hyphen, even though this is fairly standard practice in Java land.

vpipkt commented 4 years ago

@jpolchlo The whl file is named based on the contents of version.py

Also, I'd encourage some more digging in your target directory pyrasterframes/target/python/dist. I have seen sometimes there is a slightly subtle duplicated message from SBT that says there is a whl file but is not pip installable.... Example:

$ sbt ";clean;package"
....
[info] Python .whl file written to '/Users/jbrown/src/raster-frames/pyrasterframes/target/python/dist/pyrasterframes-0.8.6.dev0-py3-none-any.whl'
[info] Maven Python artifact written to '/Users/jbrown/src/raster-frames/pyrasterframes/target/scala-2.11/pyrasterframes-0.8.6-SNAPSHOT-py3-none-any.whl'
[success] Total time: 133 s (02:13), completed Feb 11, 2020 11:09:58 AM

In this case the first message is a well formed pip installable whl, note the target directory is in the pyrasterframes package under the python language dir. The second message is actually a whl I think but because it is in the scala language dir it uses the java version name...

jpolchlo commented 4 years ago

OK, that's fine. But in that case be aware that pySparkCmd reports the wrong .whl:

PYTHONSTARTUP=/tmp/sbt_eb091e00/pyrf_init.py pyspark --jars /home/jpolchlopek/work/rasterframes/pyrasterframes/target/scala-2.11/pyrasterframes-assembly-0.8.5-SNAPSHOT.jar --py-files /home/jpolchlopek/work/rasterframes/pyrasterframes/target/scala-2.11/pyrasterframes-0.8.5-SNAPSHOT-py3-none-any.whl

That's where I pulled the artifact name for installation (which, puzzlingly works just fine on my local machine, but not on EMR.

vpipkt commented 4 years ago

Interesting and that is an issue. I think we have two things to fix:

1) Update the sbt pySparkCmd to emit the name of the whl in the pyrasterframes/target/python dir 2) Possibly? Update the sbt package messages to clarify the Python whl is suitable for pip install and clarify the purpose of the "Maven Python artifact". I would actually prefer to omit the logging about the Maven artifact because I don't know what it is for.

As far as EMR it may be down to the pip version being used? pip --version?

jpolchlo commented 4 years ago

I agree. It's especially confusing to publish two wheel artifacts at all, especially when one is prone to failure. You have my :+1: !

[hadoop@ip-172-31-49-55 tmp]$ pip --version
pip 20.0.2 from /usr/local/lib/python3.6/site-packages/pip (python 3.6)
metasim commented 4 years ago

This might motivate me to finally fix the sbt/setuptools version synchronization problem. It's been a bugaboo for a while.

metasim commented 4 years ago

@jpolchlo I've made an initial stab at addressing this here: https://github.com/locationtech/rasterframes/pull/480

However, I can't seem to get the output of pySparkCmd bit to work with --py-files, and have no idea why it's not working. No errors, just no pyrasterframes in the sys.path