Open johnygomez opened 6 years ago
Update https://stackoverflow.com/questions/61494957 when this is implemented
Any updates when this might be implemented?
I guess the core maintainers are currently focused on building up the time series functionality in datatable; however, since it is open source, contributions are very much welcome.
I doubt I have the skills and deep level understanding to contribute such a feature. The fact that this feature is still missing implies to me that it takes some time and sophistication to develop it, hence the maintainers weren't able to include it so far. Regardless of that, what are the necessary educational resources to begin to understand how datatable works under the hood?
@Peter-Pasta I am still finding my way around the source code. The core maintainers can explain better
We have a tutorial on creating a new datatable function: https://datatable.readthedocs.io/en/latest/develop/create-fexpr.html
Now, since in
is an operator and not a regular function, the process will be slightly more complicated: you'd need to fill the tp_as_sequence slot and implement the sq_contains method.
As for the "core" of the function, then there are two examples that are quite similar: the replace()
function, which compares each value with a list (or map) of values, and the join()
function which compares each value with a sorted column via binary search.
Overall, on a difficulty scale from 1 (easy) to 5 (hard), I would rate this task as 2 or 3.
I think it might be easier to write a function, instead of an operator for in
, maybe dt.in
. I would like to give it a shot
Also need guidance @st-pasha @oleksiyskononenko ; when building datatable in editable mode, I dont have an easy-install.pth
in my site-packages folder, only a easy-install.py
file. As such, I cant run this command: echo "`pwd`/src" >> ${VIRTUAL_ENV}/lib/python*/site-packages/easy-install.pth
@oleksiyskononenko @st-pasha Any ideas on how I can fix the issue above?
@samukweku Sorry, I was on vacation last week and didn't see your message.
So the main challenge with "editable mode" installations in python is that there is no official PEP standard for this, which makes it hard to provide reliable instructions here. You can try one of the following approaches:
easy-install.pth
file using the command above. It should work as-is, or if you have an older version of shell, try echo "`pwd`/src" >> `ls ${VIRTUAL_ENV}/lib/python*/site-packages/easy-install.pth`
.virtualenv
command. @st-pasha , still having issues with the installation. Sucessfully got it as editable. However, the datatable version is 0.11.1. I uninstalled it, (pip uninstall datatable), thinking that would take care of the problem (as suggested here); however I get the error message below, when I try to run make test
:
make test (make_mistakes)
python -m pytest -ra --maxfail=10 -Werror tests
ImportError while loading conftest '/home/sam/github/datatable/tests/conftest.py'.
tests/__init__.py:14: in <module>
from datatable.lib import core
E ModuleNotFoundError: No module named 'datatable'
make: *** [Makefile:59: test] Error 4
Could you kindly suggest how I can fix this?
On my computer I have the following configuration: the repository is checked out into
$ pwd
/Users/pasha/github/datatable
The content of the "easy-install.pth" is
$ ls ${VIRTUAL_ENV}/lib/python*/site-packages/easy-install.pth
/Users/pasha/py36/lib/python3.6/site-packages/easy-install.pth
$ cat `ls ${VIRTUAL_ENV}/lib/python*/site-packages/easy-install.pth`
/Users/pasha/github/datatable/src
And I can verify that this works by checking
$ python
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 26 2018, 19:50:54)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import datatable
>>> datatable.__file__
'/Users/pasha/github/datatable/src/datatable/__init__.py'
The import command may fail like this if the core wasn't compiled yet with either make debug
or make build
:
>>> import datatable
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/pasha/github/datatable/src/datatable/__init__.py", line 23, in <module>
from .frame import Frame
File "/Users/pasha/github/datatable/src/datatable/frame.py", line 23, in <module>
from datatable.lib._datatable import Frame
File "/Users/pasha/github/datatable/src/datatable/lib/__init__.py", line 31, in <module>
from . import _datatable as core
ImportError: cannot import name '_datatable'
However, if the import says that datatable not found
, then it would indicate the installation in editable mode failed somehow.
@st-pasha thanks; found the error on my end and fixed; the echo
part wasn't copying the right thing to my easy-install.pth
file. All good now.
Another question: if changes are made to the C++ code, make build
is required. How do I test code changes in the python section? say for instance i want f.string_column.len()
to return 2. silly example but i hope you get my point. This does not involve any C++, so how do I do that?
If you make changes to C++, you need to run make build
(or make debug
) and then restart python console (or reload kernel in jupyter). If you make changes to python only, then you just need to restart the python console.
I'd like to filter rows according to functions like
which use pythonic syntax (syntactic sugar). Currently I need to rewrite this to primitive formula, testing all elements in the list separately.