Closed datapythonista closed 2 years ago
@jreback is there anything that you'd like to see in Ibis before releasing 2.0?
I wanted to have a better backend API, but I think that will postpone the release too much, it's a lot of work to do it well. So, from my side we're almost ready.
The only things that could make sense to have for the next release are:
ibis.impala.connect(...)
for now. In my opinion we should use:
ibis.connect('impala://user@impala/ibis_testing')
Thoughts?
The list of blockers for the next release should be updated: https://github.com/ibis-project/ibis/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Next+release%22
@datapythonista can you update here, do we have anything remaining?
reason want to push this, is so that we can then update / release on other backends, e.g. bigquery
I've been working on updating ibis-omniscidb after the last changes, and loading the backends lazily. Once I'm done with that (it's not far from passing all tests), we can do the same for bigquery, so everything is in sync.
There are few nice to have that we could take care before the release. But from my side, the only thing I think we should have is merging the classes Backend
and Client
for the backends, and defining an ABC with the main methods backends need to implement. Even if the ABC is not perfect before the release, at least have an initial spec on what backends provide.
What do you think?
all sgtm
For the release, things got a bit slow, as standardizing the full signatures of each method is tricky. I'd like to speed up things and release earlier than fully standardizing all methods. Things I would consider for the release of Ibis 2:
set_database
from the few backends that implement it, as suggested by @cpcloud current_database
and list_databases
Client
classes of backends, and move remaining methods of each backend not in the base class, as they areddl
methods, with something like con.ddl.create_table()
or similar. The reason is to have a smaller namespace for the backend class, and also make it clearer for users and devs when a backend implements ddl operations (also have an ABC for what DDL operations are expected to be implemented if a backend supports ddl operations)@jreback @cpcloud thoughts?
I'll create an issue later, but I'd consider groupping ddl methods, with something like con.ddl.create_table() or similar. The reason is to have a smaller namespace for the backend class, and also make it clearer for users and devs when a backend implements ddl operations (also have an ABC for what DDL operations are expected to be implemented if a backend supports ddl operations)
definitely later.
Finally, I'd move Impala and Clickhouse out of the main repo before Ibis 2.0. If we move forward moving those backends to SQLAlchemy, having them in other repos should make it easy to update them without having to release Ibis.
i dont' think this needs to be done for 2.0 (not a blocker) but also not a big deal.
let's just finish what's in project and release.
I'll create an issue later, but I'd consider groupping ddl methods, with something like con.ddl.create_table() or similar. The reason is to have a smaller namespace for the backend class, and also make it clearer for users and devs when a backend implements ddl operations (also have an ABC for what DDL operations are expected to be implemented if a backend supports ddl operations)
This needs design. Discussions of specifics like abstract base classes are premature. This is definitely not something that should not ship in 2.0.
Finally, I'd move Impala and Clickhouse out of the main repo before Ibis 2.0
This seems like it would significantly delay the release. -1.
Remove Client classes of backends, and move remaining methods of each backend not in the base class, as they are
Can you clarify the second part of this sentence? This part:
move remaining methods of each backend not in the base class, as they are
This seems like it would significantly delay the release. -1.
I don't think it'd delay the release much. The backends are quite well decoupled now. We moved bigquery and omnisci in few hours recently.
Can you clarify the second part of this sentence? This part:
move remaining methods of each backend not in the base class, as they are
Probably providing more context than needed here, but just to make sure we're in the same page: Until now, there was a not well defined API that backends should implement. I guess when developing new backends, a reference one was being used, and most methods in the Client
class of the reference backend were implemented in the new. In ibis/backends/tests
there are some tests that would test some methods/attributes, but IMO it wasn't clear for backend developers what had to be implemented, and not clear for users what can always be expected from any backend.
For the Ibis 2.0 release, this is changing. Now there is a base class for backends, with abstract methods, and backend methods are documented globally. So, now there is a well defined API that users can expect from a backend, and a clear expectation of what backends provide to final users. Also, the full signature of this API has been standardized as much as possible (some differences are needed, but many of the original differences were arbitrary).
Since making these changes involve updating something like 14 backends, those changes have been implemented step by step, mostly one method at a time. We've got many functions that are now part of this documented and better defined (with an abstract class) API. But many functions still exist in the Client
of the different backends, that are not. In some cases they are backend specific functions. In some they are implemented by a subset of backends...
There are two different classes involved. Backend
, which is returned by for example ibis.sqlite
. And Client
which is returned by ibis.sqlite.connect(...)
. In the past ibis.sqlite
was a module ibis/backends/sqlite/api.py
if I'm not wrong, that was eventually converted to a class, to make subclassing possible. That module and Client
had significant duplication. I don't remember the exact methods, but things like compile
and others were defined in both.
The final goal would be to provide something more reliable than the getattr
used for ibis.sqlite
, which caused several problems in the past. Something like ibis.connect(backend='sqlite')
for example. This way, there would be no more Backend
and Client
, but just one class returned by connect
. We discussed, and there was a preference to name this Backend
(I had a small preference for Connection
since it's returned from connect
, but any is fine).
So, with this idea of starting with two classes, with the goal of merging them at some point, while standardizing the API, we've been moving methods from Client
to Backend
. Everything that has been standardized and documented for all backends. The sentence you are clarification for, means that continuing the work of standardizing every single method is taking a long time, and delaying the release. So it proposes for the remaining methods and for 2.0, to just move them from Client
to Backend
. To remove Client
before the release. And after 2.0, we can continue the work on improving the API.
That makes sense, thanks for the context. Is there an issue to discuss the top level API (ibis.connect)?
There are few related issues (some grooming could be useful now that a lot of work has been done and things are less abstract):
what's open here? we ought to try to release soon
what's open here? we ought to try to release soon
From my side, just #2905 (having some problems with impala tests), and then moving methods from Client
to Backend
for each backend (without having an abstract method, unifying signatures, or any other change) is enough. Everything else can be postponed to a future release.
@datapythonista ok how is this loking? anything remaining?
I'm working on cleaning up all the unused Client
classes. I think that would be nice to have, but other than that, I think we can release.
What's your idea on backends that should be included in the pypi and conda-forge packages for the ibis-framework
package? Any?
i think we should include everything as of now. in a later version can change that.
I think in the long term it's better to have them separately, since IMO it's easier for users if they just install the backend they want, and get all dependencies, than having to install the dependencies themselves as soft dependencies. Besides being able to release backends more oftern than Ibis itself.
But fine to release now as before, with all the backends in the repo in the package. And create new packages at a later point. We'll have to release Ibis to be able to move backends out, otherwise there may be problems, if two backends with the same name exist.
I almost have #3035 green, from my side we should be able to release once that PR is merged.
We've got the omnisci backend fully working with Ibis master/2.0.
I started a PR with the main changes for BigQuery (https://github.com/ibis-project/ibis-bigquery/pull/93), @tswast is working on finishing it.
From my side, we can release already. Is there any blocker I'm missing, or can I start the release?
We've got the omnisci backend fully working with Ibis master/2.0.
I started a PR with the main changes for BigQuery (ibis-project/ibis-bigquery#93), @tswast is working on finishing it.
From my side, we can release already. Is there any blocker I'm missing, or can I start the release?
nope, let's release! and thanks @datapythonista for doing this.
@ibis-project/ibis-owners starting the release now, please don't merge PRs until the release is over. Thanks!
Had some problems (apparently twine
doesn't work in a conda environment), but got the release to pip already: https://pypi.org/project/ibis-framework/2.0.0/
Will continue with the release to conda-forge.
whoo hoo!
Feedstock PR for Ibis 2.0.0 is ready, @jreback do you want to have a look: https://github.com/conda-forge/ibis-framework-feedstock/pull/55
Ibis 2.0.0 has been successfully released to both pypi and conda-forge. Seems like conda is unable to solve an environment even with only the ibis-framework
package.
I created #3041 for it.
Next release, 2.0.
Issues that need to be closed before the release:
2379 Entrypoints
2448 Backends as conda subpackages
2356 Move omnisci backend to another repo
Will be setting the milestone of these issues and any other after #2525 is discussed.
xref: https://github.com/ibis-project/ibis/issues/2321#issuecomment-726724955