Closed xmnlab closed 4 years ago
wasn’t this working a few days ago ? can u just check what changed in between
my first guess is that could be related to increased size on the new docker images (clickhouse, mysql, omniscidb) + maybe increased size of the image continuumio/miniconda3 + possible increased size of conda packages
for example here: https://github.com/ibis-project/ibis/pull/2119 for LinuxTest py 38 it is not running the omniscidb and pyspark backend tests (also is not creating the omniscidb server)
The limit for storage for each job seems to be 10 GB
As there is another backend coming (ms sql server) I think a better approach would be split jobs by backend ... in this PR I am just testing and checking if it could be an alternative
another alternative would be try to increase our limits here.
I know that github actions allows 20 jobs in parallel and azure pipelines seems to allows 10. not sure if it allows more storage usage.
another alternative would stop testing for one backend, it would works for now .. but we will need a follow up PR to fix it in a better way
ok try your theory by commenting out a backend
try impala
PS: I used mamba here because the dependencies resolution was very slow and sometime it was raising connection error.
@xmnlab i appreciate you working on this but please do this as follows:
the images are too big. what is causing that?
@jreback ok .. I can remove mamba
I think so, I built the images locally with no change and that was the result:
ibis 3.6 98def924c1b9 10 minutes ago 4.23GB
ibis-docs 3.6 ef878fa200c0 11 days ago 5.77GB
After the changes (mainly related to conda):
ibis 3.6 50bfa77f1ce1 20 seconds ago 3.27GB
ibis-docs 3.6 4c34bd0b540b 49 seconds ago 4.55GB
assuming we just have 10GB available (as I could found in the documentation) ... so it doesn't have too much space left for the other backends.
some changes here were based on this: https://jcristharif.com/conda-docker-tips.html
it seems that just the /opt/conda directory has 1.7G
here
@jreback the CI here is working using the follow changes:
ideally what we should do is add some more builds so the images are more event split across backends.
if we create a image for each backend probably we will have a requirement.yml for each backend, or we should define that inside the command line on CI recipe, is that your plan?
- pinning pymapd to 0.21 breaks compatibility with pandas 0.25.3
- documentation build will break with pandas < 1.0 (https://github.com/ibis-project/ibis/pull/2061/files#diff-e8a3e7b83fe0d46165d10497ab7a5b7eR190)
ideally what we should do is add some more builds so the images are more event split across backends.
if we create a image for each backend probably we will have a requirement.yml for each backend, or we should define that inside the command line on CI recipe, is that your plan?
so this is actually tricky because you really want to run multiple configurations for each back end. and you don't want to make this complex. so i would just split things roughly in 2 (hard code which backend is run in each image). and still run python 3.6,3.7, and 3.8
note if you for example pin pymapd > 0.21, the only do this in 3.8; remove that back end from other images. It is paramount that we test the oldest versions, and not just the newest.
pymapd doesn't for python3.8, also pyspark doens't work yet for python3.8
note: if there is any pandas version installed before the ibis installation, it seems it will never install the latest version of ibis: https://github.com/ibis-project/ibis/issues/2144
why would there be anything installed? this should be done in clean environments
why exactly did the CI break in the first place? what was the tipping point - let’s just revert that rather than trying to reinvent the world
probably here is not the best place to discuss that, but maybe would be easier to handle any backend inside a dedicated repo that will depend on an ibis-core (or something similar)
correct we just want to get back to the stable state before any other changes are made
too many changes too fast is cause for disaster
pls update th PR as i indicated
this has too many changes and is not testing things as before
pls update th PR as i indicated
you mean
so this is actually tricky because you really want to run multiple configurations for each back end. and you don't want to make this complex. so i would just split things roughly in 2 (hard code which backend is run in each image). and still run python 3.6,3.7, and 3.8
this?
just remove pymapd from all builds and make a separate build for it
Ok I will do it! Thanks
just removing omniscidb backend and pymapd didn't resolve the problem.
my 2 cents:
conda is working fine and has for years this is not a problem
parquet issue seems to be related to this: https://github.com/conda-forge/pyarrow-feedstock/issues/104
this problem looks similar https://github.com/Azure/batch-scoring-for-dl-models/issues/17
closing in favor of #2195