Closed datapythonista closed 3 years ago
can u check the version of sa and use the original if < 1.4 for compatibility (also likely we have a fixed version of sa in the min versions)
now it might work with LargeBinary in the older (not sure)
sqlalchemy.Binary
was renamed to sqlalchemy.LargeBinary
in 0.6, 11 years ago, so any version will work with this change.
A good idea to have a sqlalchemy pinned somewhere anyway. In any case, looks like more things are broken, and they are not as straight-forward. Will need to have a look.
Opened an issue in SQLAlchemy. Depending on the outcome will look for a workaround: https://github.com/sqlalchemy/sqlalchemy/issues/6068
Fixed an SQLAlchemy regression in sqlalchemy.case
and reported it here: https://github.com/sqlalchemy/sqlalchemy/issues/6081
Still some more problems pending in SQLAlchemy 1.4.
Hi @datapythonista, do you know if there is an ETA for this? Do we need to wait on changes from the SQLAlchemy side, or could we just pin our CI to the previous version in the meantime? Thanks!
I'll finish this early next week, I don't have time before. I tried pinning, but it wasn't as straightforward as I thought, and closed the PR.
SQLAlchemy will probably fix the regressions, but I'll make Ibis work also in the broken version.
We also have the spark failure, which is prevenring the CI from being green, and I'm clueless of what's the issue there. Maybe you can have a look at that, as that will be the blocker soon.
2021-03-17T18:07:52.6174432Z a = ('xro12970', <py4j.java_gateway.GatewayClient object at 0x7f93ea2b3dd0>, 'o12963', 'collectToPython')
2021-03-17T18:07:52.6175210Z kw = {}
2021-03-17T18:07:52.6176226Z s = 'java.lang.IllegalArgumentException: Unsupported class file major version 55'
2021-03-17T18:07:52.6178497Z stackTrace = 'org.apache.xbean.asm6.ClassReader.<init>(ClassReader.java:166)\n\t at org.apache.xbean.asm6.ClassReader.<init>(ClassR...)\n\t at py4j.GatewayConnection.run(GatewayConnection.java:238)\n\t at java.base/java.lang.Thread.run(Thread.java:834)'
2021-03-17T18:07:52.6180372Z
2021-03-17T18:07:52.6180666Z def deco(*a, **kw):
2021-03-17T18:07:52.6180962Z try:
2021-03-17T18:07:52.6181459Z return f(*a, **kw)
2021-03-17T18:07:52.6182001Z except py4j.protocol.Py4JJavaError as e:
2021-03-17T18:07:52.6182648Z s = e.java_exception.toString()
2021-03-17T18:07:52.6183565Z stackTrace = '\n\t at '.join(map(lambda x: x.toString(),
2021-03-17T18:07:52.6184195Z e.java_exception.getStackTrace()))
2021-03-17T18:07:52.6185360Z if s.startswith('org.apache.spark.sql.AnalysisException: '):
2021-03-17T18:07:52.6186334Z raise AnalysisException(s.split(': ', 1)[1], stackTrace)
2021-03-17T18:07:52.6187319Z if s.startswith('org.apache.spark.sql.catalyst.analysis'):
2021-03-17T18:07:52.6188285Z raise AnalysisException(s.split(': ', 1)[1], stackTrace)
2021-03-17T18:07:52.6189501Z if s.startswith('org.apache.spark.sql.catalyst.parser.ParseException: '):
2021-03-17T18:07:52.6190683Z raise ParseException(s.split(': ', 1)[1], stackTrace)
2021-03-17T18:07:52.6192144Z if s.startswith('org.apache.spark.sql.streaming.StreamingQueryException: '):
2021-03-17T18:07:52.6193505Z raise StreamingQueryException(s.split(': ', 1)[1], stackTrace)
2021-03-17T18:07:52.6194892Z if s.startswith('org.apache.spark.sql.execution.QueryExecutionException: '):
2021-03-17T18:07:52.6196498Z raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
2021-03-17T18:07:52.6197530Z if s.startswith('java.lang.IllegalArgumentException: '):
2021-03-17T18:07:52.6198540Z > raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
2021-03-17T18:07:52.6199770Z E pyspark.sql.utils.IllegalArgumentException: 'Unsupported class file major version 55'
Is any java dependency updated here? I thought SQLAlchemy is python only?
I will do a bit debug
https://github.com/ibis-project/ibis/pull/2694/checks?check_run_id=2150005088
Looks like indeed the java version is 11 in the new CI with doesn't work with Spark 3.7 pipeline (running spark 2.4.3 there). Although I am not sure if this is changed introduced by this PR or sth other change..
Let me try setting that to JAVA_HOME_8_X64=/usr/lib/jvm/adoptopenjdk-8-hotspot-amd64 to Spark 3.7 pipeline
Thanks for helping with this @icexelloss. There are two different and unrelated things broken in the CI. One is the release of SQLAclhemy 1.4 introducing some regressions. The second, which was failing before, is the Java error in the PySpark build. I'll be having a look at the pending SQLAlchemy regressions on Monday. The PySpark stuff is unrelated, and I don't have any idea of why is failing, but it's not cause by the new SQLAlchemy release.
@datapythonista Thanks for the info. I think that pyspark failure is perhaps due to an orthogonal change in the base image (Still no idea why JAVA_HOME points to java11 now). I am working an setting that back for the Spark/PySpark 3.7 pipeline.
@datapythonista Thanks for the info. I think that pyspark failure is perhaps due to an orthogonal change in the base image (Still no idea why JAVA_HOME points to java11 now). I am working an setting that back for the Spark/PySpark 3.7 pipeline.
In case it's useful, I opened the PR #2688 reverting the changes since PySpark started failing. I was trying to check if it was those changes (which seem unrelated), or most likely something external, like a change in a dependency. The problem is that SQLAlchemy started to fail, and I couldn't see if the error was happening or not.
@icexelloss - The Java change is because of https://github.com/actions/virtual-environments/issues/1816. We use ubuntu-latest
as our os test image, and that recently changed to 20.04 (the actions CI has had a warning this was coming for a long time). With that change, the default changed:
Thanks @gerrymanoim, that's very useful to know. I think we can then fix the CI by using #2695 (will need the SQLAlchemy problem fixed first). And then create an issue to make Ibis compatible with the latest version. Does it make sense?
@datapythonista I fixed the Spark CI and here is the PR to your branch: https://github.com/datapythonista/ibis/pull/3
Thanks @gerrymanoim . Super helpful
@datapythonista I fixed the Spark CI and here is the PR to your branch: datapythonista#3
I think the best way to fix the problem would be:
java=XXX
) to the pyspark minimum dependenciesOr if we really want to use the environment variable approach, you can just set them in the build yaml, you can add a include
option in the matrix to specify environment variables depending on the build of the matrix being executed.
@datapythonista I fixed the Spark CI and here is the PR to your branch: datapythonista#3
I think the best way to fix the problem would be:
- Make sure in the code that we raise an exception with a meaningful message if an incompatible version of pyspark and java are found
- In the CI, install java with conda (add
java=XXX
) to the pyspark minimum dependencies- I don't think we can do much for pip, but I guess if we require java for conda-forge, the pyspark and java versions should then be compatible, and we shouldn't have this problem happening anymore
Or if we really want to use the environment variable approach, you can just set them in the build yaml, you can add a
include
option in the matrix to specify environment variables depending on the build of the matrix being executed.
Yeah this sounds good. I definitely don't have much experience building CI with docker so I trust your decision here.
Yeah this sounds good. I definitely don't have much experience building CI with docker so I trust your decision here.
Happy to take care of it if it sounds good. It shouldn't take me long to get it implemented. Or let me know if you give it a try and have any question/problem.
Yeah this sounds good. I definitely don't have much experience building CI with docker so I trust your decision here.
Happy to take care of it if it sounds good. It shouldn't take me long to get it implemented. Or let me know if you give it a try and have any question/problem.
Trying now. I added java to the pyspark-min.yml and pyspark.yml. CI running now let's see how that goes.
Make sure in the code that we raise an exception with a meaningful message if an incompatible version of pyspark and java are found
I didn't add checks for this since we set specific java version now so this could be less an issue and also couldn't figure out a very clean way to do this. ..
I didn't add checks for this since we set specific java version now so this could be less an issue and also couldn't figure out a very clean way to do this. ..
It's not an issue for us, but I guess users can run Ibis in their own environment, with a new Java but Python 2. I think they'll appreciate if instead of the error we are having now we show them a nice message like Ibis with pyspark 2 work with JDK up to version 8. Use pyspark 3 to use a higher version of the JDK
.
@datapythonista I tried to add openjdk in conda and it did work locally (JAVA_HOME is set properly after install openjdk), however in the CI, JAVA_HOME is unchanged after I install openjdk (e.g.: https://github.com/ibis-project/ibis/pull/2697/checks?check_run_id=2151864647)
I wonder if you have some idea what might be happening?
I'm a bit surprised you have to install openjdk via conda instead of setting the appropriate env variable in the spark pipeline. Is conda's openjdk=8
different than JAVA_HOME_8_X64
?
@datapythonista We are kind of blocked by this now, in interest of time, I suggest that we use one of the following changes to unblock Spark now (either one should work):
I tried the conda idea but didn't quite work and I am not sure why, so I suggest we leave it as follow up:
@icexelloss I'm on this now. I fixed one of the errors this morning, and I'm with what I think it's the last one, but it's a bit trickier (but I expect to have it fixed today).
Once SQLAlchemy 1.4 is working, I'd merge #2695, and I think that should leave the CI green. And then I think it can make sense to make things more robust for spark/java version compatibility, but we can do that without having anything blocked.
Does this make sense to you?
@icexelloss I'm on this now. I fixed one of the errors this morning, and I'm with what I think it's the last one, but it's a bit trickier (but I expect to have it fixed today).
Once SQLAlchemy 1.4 is working, I'd merge #2695, and I think that should leave the CI green. And then I think it can make sense to make things more robust for spark/java version compatibility, but we can do that without having anything blocked.
Does this make sense to you?
Sounds good to me! Thank you very much!
@jreback this is green now (it's red because the spark error is independent and we'll fix it in #2695). Do you mind having a look please?
tl;dr:
clickhouse_sqlalchemy
doesn't work with 1.4 (requires <1.3
, so pinning the version in our CI)>=1.1
in setup.py
, but tests fail in 1.1. and 1.2, so requiring 1.3.The test that I had to change is not a material change. The SQL query is the same, and in 1.3 there is no difference between the old test and the new one, but in 1.4 the query misses the table prefix in the column names (e.g. SELECT t0.name FROM whatever AS t0
instead of SELECT name FROM whatever AS t0
). I couldn't find a simple way to make tests pass without making this change.
thanks @datapythonista
Closes #2689