ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
4.43k stars 544 forks source link

RLS: 2.0 #2526

Closed datapythonista closed 2 years ago

datapythonista commented 3 years ago

Next release, 2.0.

Issues that need to be closed before the release:

Will be setting the milestone of these issues and any other after #2525 is discussed.

xref: https://github.com/ibis-project/ibis/issues/2321#issuecomment-726724955

datapythonista commented 3 years ago

@jreback is there anything that you'd like to see in Ibis before releasing 2.0?

I wanted to have a better backend API, but I think that will postpone the release too much, it's a lot of work to do it well. So, from my side we're almost ready.

The only things that could make sense to have for the next release are:

Thoughts?

datapythonista commented 3 years ago

The list of blockers for the next release should be updated: https://github.com/ibis-project/ibis/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Next+release%22

jreback commented 2 years ago

@datapythonista can you update here, do we have anything remaining?

reason want to push this, is so that we can then update / release on other backends, e.g. bigquery

datapythonista commented 2 years ago

I've been working on updating ibis-omniscidb after the last changes, and loading the backends lazily. Once I'm done with that (it's not far from passing all tests), we can do the same for bigquery, so everything is in sync.

There are few nice to have that we could take care before the release. But from my side, the only thing I think we should have is merging the classes Backend and Client for the backends, and defining an ABC with the main methods backends need to implement. Even if the ABC is not perfect before the release, at least have an initial spec on what backends provide.

What do you think?

jreback commented 2 years ago

all sgtm

datapythonista commented 2 years ago

For the release, things got a bit slow, as standardizing the full signatures of each method is tricky. I'd like to speed up things and release earlier than fully standardizing all methods. Things I would consider for the release of Ibis 2:

@jreback @cpcloud thoughts?

jreback commented 2 years ago

I'll create an issue later, but I'd consider groupping ddl methods, with something like con.ddl.create_table() or similar. The reason is to have a smaller namespace for the backend class, and also make it clearer for users and devs when a backend implements ddl operations (also have an ABC for what DDL operations are expected to be implemented if a backend supports ddl operations)

definitely later.

Finally, I'd move Impala and Clickhouse out of the main repo before Ibis 2.0. If we move forward moving those backends to SQLAlchemy, having them in other repos should make it easy to update them without having to release Ibis.

i dont' think this needs to be done for 2.0 (not a blocker) but also not a big deal.

let's just finish what's in project and release.

cpcloud commented 2 years ago

I'll create an issue later, but I'd consider groupping ddl methods, with something like con.ddl.create_table() or similar. The reason is to have a smaller namespace for the backend class, and also make it clearer for users and devs when a backend implements ddl operations (also have an ABC for what DDL operations are expected to be implemented if a backend supports ddl operations)

This needs design. Discussions of specifics like abstract base classes are premature. This is definitely not something that should not ship in 2.0.

cpcloud commented 2 years ago

Finally, I'd move Impala and Clickhouse out of the main repo before Ibis 2.0

This seems like it would significantly delay the release. -1.

cpcloud commented 2 years ago

Remove Client classes of backends, and move remaining methods of each backend not in the base class, as they are

Can you clarify the second part of this sentence? This part:

move remaining methods of each backend not in the base class, as they are

datapythonista commented 2 years ago

This seems like it would significantly delay the release. -1.

I don't think it'd delay the release much. The backends are quite well decoupled now. We moved bigquery and omnisci in few hours recently.

Can you clarify the second part of this sentence? This part:

move remaining methods of each backend not in the base class, as they are

Probably providing more context than needed here, but just to make sure we're in the same page: Until now, there was a not well defined API that backends should implement. I guess when developing new backends, a reference one was being used, and most methods in the Client class of the reference backend were implemented in the new. In ibis/backends/tests there are some tests that would test some methods/attributes, but IMO it wasn't clear for backend developers what had to be implemented, and not clear for users what can always be expected from any backend.

For the Ibis 2.0 release, this is changing. Now there is a base class for backends, with abstract methods, and backend methods are documented globally. So, now there is a well defined API that users can expect from a backend, and a clear expectation of what backends provide to final users. Also, the full signature of this API has been standardized as much as possible (some differences are needed, but many of the original differences were arbitrary).

Since making these changes involve updating something like 14 backends, those changes have been implemented step by step, mostly one method at a time. We've got many functions that are now part of this documented and better defined (with an abstract class) API. But many functions still exist in the Client of the different backends, that are not. In some cases they are backend specific functions. In some they are implemented by a subset of backends...

There are two different classes involved. Backend, which is returned by for example ibis.sqlite. And Client which is returned by ibis.sqlite.connect(...). In the past ibis.sqlite was a module ibis/backends/sqlite/api.py if I'm not wrong, that was eventually converted to a class, to make subclassing possible. That module and Client had significant duplication. I don't remember the exact methods, but things like compile and others were defined in both.

The final goal would be to provide something more reliable than the getattr used for ibis.sqlite, which caused several problems in the past. Something like ibis.connect(backend='sqlite') for example. This way, there would be no more Backend and Client, but just one class returned by connect. We discussed, and there was a preference to name this Backend (I had a small preference for Connection since it's returned from connect, but any is fine).

So, with this idea of starting with two classes, with the goal of merging them at some point, while standardizing the API, we've been moving methods from Client to Backend. Everything that has been standardized and documented for all backends. The sentence you are clarification for, means that continuing the work of standardizing every single method is taking a long time, and delaying the release. So it proposes for the remaining methods and for 2.0, to just move them from Client to Backend. To remove Client before the release. And after 2.0, we can continue the work on improving the API.

cpcloud commented 2 years ago

That makes sense, thanks for the context. Is there an issue to discuss the top level API (ibis.connect)?

datapythonista commented 2 years ago

There are few related issues (some grooming could be useful now that a lot of work has been done and things are less abstract):

jreback commented 2 years ago

what's open here? we ought to try to release soon

datapythonista commented 2 years ago

what's open here? we ought to try to release soon

From my side, just #2905 (having some problems with impala tests), and then moving methods from Client to Backend for each backend (without having an abstract method, unifying signatures, or any other change) is enough. Everything else can be postponed to a future release.

jreback commented 2 years ago

@datapythonista ok how is this loking? anything remaining?

datapythonista commented 2 years ago

I'm working on cleaning up all the unused Client classes. I think that would be nice to have, but other than that, I think we can release.

What's your idea on backends that should be included in the pypi and conda-forge packages for the ibis-framework package? Any?

jreback commented 2 years ago

i think we should include everything as of now. in a later version can change that.

datapythonista commented 2 years ago

I think in the long term it's better to have them separately, since IMO it's easier for users if they just install the backend they want, and get all dependencies, than having to install the dependencies themselves as soft dependencies. Besides being able to release backends more oftern than Ibis itself.

But fine to release now as before, with all the backends in the repo in the package. And create new packages at a later point. We'll have to release Ibis to be able to move backends out, otherwise there may be problems, if two backends with the same name exist.

I almost have #3035 green, from my side we should be able to release once that PR is merged.

datapythonista commented 2 years ago

We've got the omnisci backend fully working with Ibis master/2.0.

I started a PR with the main changes for BigQuery (https://github.com/ibis-project/ibis-bigquery/pull/93), @tswast is working on finishing it.

From my side, we can release already. Is there any blocker I'm missing, or can I start the release?

jreback commented 2 years ago

We've got the omnisci backend fully working with Ibis master/2.0.

I started a PR with the main changes for BigQuery (ibis-project/ibis-bigquery#93), @tswast is working on finishing it.

From my side, we can release already. Is there any blocker I'm missing, or can I start the release?

nope, let's release! and thanks @datapythonista for doing this.

datapythonista commented 2 years ago

@ibis-project/ibis-owners starting the release now, please don't merge PRs until the release is over. Thanks!

datapythonista commented 2 years ago

Had some problems (apparently twine doesn't work in a conda environment), but got the release to pip already: https://pypi.org/project/ibis-framework/2.0.0/

Will continue with the release to conda-forge.

jreback commented 2 years ago

whoo hoo!

datapythonista commented 2 years ago

Feedstock PR for Ibis 2.0.0 is ready, @jreback do you want to have a look: https://github.com/conda-forge/ibis-framework-feedstock/pull/55

datapythonista commented 2 years ago

Ibis 2.0.0 has been successfully released to both pypi and conda-forge. Seems like conda is unable to solve an environment even with only the ibis-framework package.

I created #3041 for it.