Open wesm opened 7 years ago
I'm working on supporting 2.7 + 3.5/3.6 simultaneously in https://github.com/weld-project/weld/pull/132, though Unicode support muddies things a bit.
While you're at it, it would be nice to plot a course to conda install weld
and get all the Python things in a single import weld
statement. This probably means a package structure like
weld/
grizzy/ ...
I'm also exploring what it will take to package the shared libs into the packages so binary wheels can be built and distributed.
You can look at what we did in Apache Arrow with manylinux1: https://github.com/wesm/arrow/blob/master/python/manylinux1/build_arrow.sh
and https://github.com/wesm/arrow/blob/master/python/setup.py#L210
so all the shared libs (build with CMake) get bundled in the wheel. Probably possible to do something similar with OS X. We have an extra layer of complexity in that we also want to expose a C API for Arrow via the binary wheel (similar to the NumPy C API), but we're still working on that.
conda is the easiest way since you can package libweld
(the shared libraries) and weld-python
(the Python package and C extensions) as separate components
There seems to be some GitHub snafu right now so all the Apache git mirrors on GitHub are down at the moment
@cirla how is work on this going? I'm new to Weld, but Python 3.6 + packaging support lines up with my interests; is there a part of this work (and/or #132) that is sufficiently
that I could try tackling it?
Thanks!
@snakescott I haven't been actively working on this recently, but there are two separate things to tackle for #132:
language:
in .travis.yml
to python
so that it spawns a separate job for each python version specified under the python:
list, but then we'd need to install rust during the install:
phase (i.e. curl -sSf https://build.travis-ci.org/files/rustup-init.sh | sh -s -- --default-toolchain=$TRAVIS_RUST_VERSION -y
), but this would preclude us from testing against multiple versions of rust.WeldInt
/i32
s. The downsides to this are the overhead in converting the string representation, the extra memory usage by using a wide fixed-width encoding (especially if the majority of the characters are ASCII), differences in the CPython 2 and 3 API for converting Unicode, and ambiguity when decoding (is it an array of ints or an array of UCS4 codepoints?)WeldChar
/i8
implementations, but these operations would behave very poorly when the string contains multi-byte characters, modifiers, etc. @cirla
I will take a look at the Travis config and see if I can puzzle anything out -- maybe hit up Travis experts in Slack if I get stuck.
Unicode seems trickier (and more interesting!). A few thoughts/questions:
str
does map to WeldVec(WeldChar())
, Grizzly represents string arrays as a vector of pointers (#136). So I'm not completely sure which Weld core operations (arithmetic, len, etc) apply to python strings in practice, versus applying to vectors of pointers to strings? What do you think? Are there simple Grizzly snippets that expose perf differences when running ascii vs unicode?For travis, it seems like it should be possible to
python
blockaddons
test_llvm_version.sh
scriptinstall
block to test_llvm_version.sh
, using a virtualenv to handle choice of python environmentsIf this sounds sufficiently promising, I can work on a PR. I expect it to be possible to merge something like this prior to full Python 3.6 support (just don't add a test_llvm_version.sh
invocation for 3.6 until it is ready).
That's sounds reasonable, just make sure that the C++ code/shared library is built against the right version of Python for each one.
Sorry about spamming the ticket with my Travis adventures. will leave the issue reference off until it is ready for PR next time!
Ah, sorry, I have a better handle on the UTF side of things -- the example I was missing was slice, which is unfortunately missing from the language doc. From what I can see from the codebase, numpy uses UCS4 internally, so maybe that's appropriate for Grizzly? Support for both ascii (no unnecessary memory tax) and UCS4 (for numpy unicode compat) might be a good place to start.
If we're comfortable with the memory and conversion overhead, I can look into the CPython unicode API differences as well as the decoding ambiguity.
It will be important to run on Python 3, preferably both 2.7 and 3.5/3.6 with a single codebase (the
six
module helps with this)