ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
4.42k stars 544 forks source link

WIP: fix CI #2182

Closed xmnlab closed 4 years ago

xmnlab commented 4 years ago
  1. I am trying to get the current errors (without any change)
jreback commented 4 years ago

wasn’t this working a few days ago ? can u just check what changed in between

xmnlab commented 4 years ago

my first guess is that could be related to increased size on the new docker images (clickhouse, mysql, omniscidb) + maybe increased size of the image continuumio/miniconda3 + possible increased size of conda packages

for example here: https://github.com/ibis-project/ibis/pull/2119 for LinuxTest py 38 it is not running the omniscidb and pyspark backend tests (also is not creating the omniscidb server)

The limit for storage for each job seems to be 10 GB

As there is another backend coming (ms sql server) I think a better approach would be split jobs by backend ... in this PR I am just testing and checking if it could be an alternative

another alternative would be try to increase our limits here.

I know that github actions allows 20 jobs in parallel and azure pipelines seems to allows 10. not sure if it allows more storage usage.

another alternative would stop testing for one backend, it would works for now .. but we will need a follow up PR to fix it in a better way

jreback commented 4 years ago

ok try your theory by commenting out a backend

try impala

xmnlab commented 4 years ago
xmnlab commented 4 years ago

PS: I used mamba here because the dependencies resolution was very slow and sometime it was raising connection error.

jreback commented 4 years ago

@xmnlab i appreciate you working on this but please do this as follows:

the images are too big. what is causing that?

xmnlab commented 4 years ago

@jreback ok .. I can remove mamba

I think so, I built the images locally with no change and that was the result:

ibis                                       3.6                 98def924c1b9        10 minutes ago      4.23GB
ibis-docs                                  3.6                 ef878fa200c0        11 days ago         5.77GB

After the changes (mainly related to conda):

ibis                                       3.6                 50bfa77f1ce1        20 seconds ago      3.27GB
ibis-docs                                  3.6                 4c34bd0b540b        49 seconds ago      4.55GB

assuming we just have 10GB available (as I could found in the documentation) ... so it doesn't have too much space left for the other backends.

some changes here were based on this: https://jcristharif.com/conda-docker-tips.html

xmnlab commented 4 years ago

it seems that just the /opt/conda directory has 1.7G here

xmnlab commented 4 years ago

@jreback the CI here is working using the follow changes:

xmnlab commented 4 years ago

ideally what we should do is add some more builds so the images are more event split across backends.

if we create a image for each backend probably we will have a requirement.yml for each backend, or we should define that inside the command line on CI recipe, is that your plan?

jreback commented 4 years ago

ideally what we should do is add some more builds so the images are more event split across backends.

if we create a image for each backend probably we will have a requirement.yml for each backend, or we should define that inside the command line on CI recipe, is that your plan?

so this is actually tricky because you really want to run multiple configurations for each back end. and you don't want to make this complex. so i would just split things roughly in 2 (hard code which backend is run in each image). and still run python 3.6,3.7, and 3.8

note if you for example pin pymapd > 0.21, the only do this in 3.8; remove that back end from other images. It is paramount that we test the oldest versions, and not just the newest.

xmnlab commented 4 years ago

pymapd doesn't for python3.8, also pyspark doens't work yet for python3.8

xmnlab commented 4 years ago

note: if there is any pandas version installed before the ibis installation, it seems it will never install the latest version of ibis: https://github.com/ibis-project/ibis/issues/2144

jreback commented 4 years ago

why would there be anything installed? this should be done in clean environments

jreback commented 4 years ago

why exactly did the CI break in the first place? what was the tipping point - let’s just revert that rather than trying to reinvent the world

xmnlab commented 4 years ago

probably here is not the best place to discuss that, but maybe would be easier to handle any backend inside a dedicated repo that will depend on an ibis-core (or something similar)

jreback commented 4 years ago

correct we just want to get back to the stable state before any other changes are made

too many changes too fast is cause for disaster

jreback commented 4 years ago

pls update th PR as i indicated

this has too many changes and is not testing things as before

xmnlab commented 4 years ago

pls update th PR as i indicated

you mean

so this is actually tricky because you really want to run multiple configurations for each back end. and you don't want to make this complex. so i would just split things roughly in 2 (hard code which backend is run in each image). and still run python 3.6,3.7, and 3.8

this?

jreback commented 4 years ago

just remove pymapd from all builds and make a separate build for it

xmnlab commented 4 years ago

Ok I will do it! Thanks

xmnlab commented 4 years ago

just removing omniscidb backend and pymapd didn't resolve the problem.

my 2 cents:

  1. conda environment is getting big each day. IIRC, just the conda folder had 1.7GB (with no changes).
  2. conda environment could be very slow depends on the dependencies. as each day dependencies are being updated (more new packages versions) .. the dependencies resolution could be a challenge
jreback commented 4 years ago

conda is working fine and has for years this is not a problem

xmnlab commented 4 years ago

parquet issue seems to be related to this: https://github.com/conda-forge/pyarrow-feedstock/issues/104

xmnlab commented 4 years ago

this problem looks similar https://github.com/Azure/batch-scoring-for-dl-models/issues/17

jreback commented 4 years ago

closing in favor of #2195