I understand there was an effort to move ibis-bigquery out of ibis.
I think this has made development of both projects more difficult mainly due to lack of awareness around breaking changes, and having to having multiple repos running against HEAD checked out. I'm struggling to see what the benefits of maintaining a separate repo are.
After developing in this model for about 1.5 years I think there's less of benefit here than we thought.
Here are the points in favor of the original move:
We have 14 backends in this repo, at least 3 or 4 in separate repos, and new ones being developed. Having all them here is no longer an option
I don't quite understand this point. I think keeping things in the main repo as long as possible is a good idea because changes all happen in sync, tests fail along with everything else. Code moves along with other code.
We should make it easier to build an external backend, but artificially forcing a backend to be external creates unnecessary compatibility work.
With the new entrypoints system that we just merged (#2379), the API for the user will be exactly the same
I think the entrypoints system has been great for the project and we should consider that successful 🎉!
Users will have to install a separate conda package. But I think this is better than having to install the backend dependencies individually, or installing all backend dependencies with Ibis
For the previous point, we're already planning to have different conda-forge packages for some backends (#2448). So, where the backend is developed won't have an impact for users
The previous two points are effectively "that ship has sailed": there are now separate packages for each backend for conda-forge, and we use extras functionality for pip installs.
External backends can have other maintainers, that can know better the backend code than Ibis devs. That's the case of @tswast
In practice and also by design the parts of the backends that differ are few and far between. For example, the BigQuery backend uses setup queries for compiling UDFs and no other backend uses this part of the SQL compilers. I don't think the epsilon additional maintenance is huge, most of the knowledge is not ibis related and readily available in documentation.
For the previous point, the backend will be as well maintained whether it's here, or in a separate repo. Probably better maintained, since Ibis maintainers won't be a bottleneck
I think the maintenance quality is 99% showing up, AKA having a person or people available to do the work, which is unrelated to whether the backend is inside the repo or not.
Separate backends can be released more often if needed. If BigQuery adds an additional feature, it can be supported faster, by just releasing the backend, and not having to wait for an Ibis release
After ibis 4, we're going to release much more often, likely monthly or more frequent.
The Ibis CI is a bit heavy with 17 builds. Which besides taking time, makes navigating the builds a bit annoying
I think our CI is in a decent place right now and definitely has room for one more backend's worth of tests. We recently added polars and snowflake and anecdotally I haven't seen much difference in iteration speed.
In the case of BigQuery, it has special requirements regarding the CI, since access information can't be used on PR builds. Moving it elsewhere will simplify the CI here
Maybe? We can basically replicate what is being done here for BigQuery in the main repo.
The only drawback that I would consider is being able to make changes to both Ibis core and the backends at the same time.
This is in fact the number one reason I want to move this backend back into the main ibis repo 😄
I understand there was an effort to move
ibis-bigquery
out ofibis
.I think this has made development of both projects more difficult mainly due to lack of awareness around breaking changes, and having to having multiple repos running against HEAD checked out. I'm struggling to see what the benefits of maintaining a separate repo are.
After developing in this model for about 1.5 years I think there's less of benefit here than we thought.
The original issue is https://github.com/ibis-project/ibis/issues/2665.
Here are the points in favor of the original move:
I don't quite understand this point. I think keeping things in the main repo as long as possible is a good idea because changes all happen in sync, tests fail along with everything else. Code moves along with other code.
We should make it easier to build an external backend, but artificially forcing a backend to be external creates unnecessary compatibility work.
I think the entrypoints system has been great for the project and we should consider that successful 🎉!
The previous two points are effectively "that ship has sailed": there are now separate packages for each backend for conda-forge, and we use extras functionality for pip installs.
In practice and also by design the parts of the backends that differ are few and far between. For example, the BigQuery backend uses setup queries for compiling UDFs and no other backend uses this part of the SQL compilers. I don't think the epsilon additional maintenance is huge, most of the knowledge is not ibis related and readily available in documentation.
I think the maintenance quality is 99% showing up, AKA having a person or people available to do the work, which is unrelated to whether the backend is inside the repo or not.
After ibis 4, we're going to release much more often, likely monthly or more frequent.
I think our CI is in a decent place right now and definitely has room for one more backend's worth of tests. We recently added polars and snowflake and anecdotally I haven't seen much difference in iteration speed.
Maybe? We can basically replicate what is being done here for BigQuery in the main repo.
This is in fact the number one reason I want to move this backend back into the main ibis repo 😄