Open bingbong-sempai opened 1 year ago
my personal opinion is not strictly opposed. though I don't like having a lot of aliases for methods, perhaps this is a good case (and thus wouldn't need to be a breaking change)
yeah i agree that too many aliases is a bad thing. it's just that pairing read_
and write_
(same with from_
and to_
) feels like the "correct" thing to do.
I agree that there's a nice consistency in using read_
and write_
for file-based things and from_
and to_
for memory-based things.
It's also going to break a lot of users. I'm not sure that's worth it. It also potentially leaves users thinking that we don't support writing to various filetypes because they're expecting it under to_
and not tab-completing out to write_
.
There's also the issue that depending on the backend, we're sometimes uploading a file to, say, Snowflake, which isn't a cheap operation, vs. creating a view based only on parquet metadata, with DuckDB, which IS cheap.
Polars also has a scan_parquet
method to distinguish this kind of behavior, but if we follow that convention we'd end up with a different API for a given backend, because one would only support read_
and the other would only support scan_
.
Overall, I'm inclined to leave things as they are.
I am a -1 on adding an alias.
that's unfortunate, i was hoping ibis would be able to make API changes that pandas couldn't
an option can be to alias read_
to from_
instead?
from_
can cover both read and scan operations.
that way io could pair from_
and to_
.
Hey @bingbong-sempai -- I think we're going to stick with our current convention. Most users are probably coming from pandas
and there are other libraries that follow the convention. There's also a benefit in seeing all of the targets for commands like read_
in one block when using tab-completion.
Hopefully you find other parts of Ibis to enjoy and this API choice isn't a blocker for you.
Going to close this out -- thanks!
just to add on a bit -- write
makes sense for files (CSV, parquet) but less sense for other things Ibis can output (PyArrow tables, PyArrow batches, pandas, torch, etc.)
Just to add, both Polars and DuckDB use write_parquet
and not to_parquet
.
not sure how we feel about it in general as the Ibis project, but I'm personally not opposed (similar to my earlier comment) to adding aliases for common spellings of existing APIs -- e.g. a write_parquet
method that perhaps we don't even put on the website (or do and note it's just an alias of to_parquet
)
I'll reopen this so it shows in our issue triage and we can discuss
My preference is to use write_<parquet|csv|...>
which triggers computation. I would like to have lazy counterparts eventually which could use to_<format>
prefix but no strong opinions there. I think methods triggering actions should be verbs.
I would like to have lazy counterparts eventually which could use
to_<format>
prefix but no strong opinions there.
Note that we already have to_pandas
/to_pyarrow
/to_polars
which run eagerly and return an in memory result. I think those names are good, but mixing to_*
methods that don't run eagerly in with them would be confusing IMO.
I'm not opposed to a convention of:
write_*
: eager methods that compute and write the output somewhereto_*
: eager methods that compute and return some in-memory objectThat said, I don't find the current convention of putting both of these under to_*
bad. A user can always do expr.to_<TAB>
to see all the things they can convert an expression into.
The only thing I'm against is having any aliases where both expr.to_parquet
and expr.write_parquet
exist and do the same thing. As a user I would find that confusing and would spend a bunch of time trying to figure out if/how the expr.to_*
and expr.write_*
methods differed. (to be clear, short lived aliases for deprecation/migration are fine).
Is your feature request related to a problem?
No response
Describe the solution you'd like
I know pairing read and to functions follows Pandas' style but I think read and write makes more sense (it's also what polars uses). For example, read_parquet and write_parquet for reading/writing to disk, from_pandas and to_pandas for reading/writing to memory.
What version of ibis are you running?
6.1.0
What backend(s) are you using, if any?
No response
Code of Conduct