Closed matthiasgomolka closed 8 months ago
Incorrect assumption! https://github.com/Rdatatable/data.table/blob/master/NEWS.md Data table is under development.
Well, the last real feature update was more than two years ago. That's quite a while, isn't it?
A package not being actively developed is not a good reason to stop using it. You could argue that it's a good reason to keep using it, because it's less likely to make breaking changes. It would be a good idea to stop using it if it weren't actively maintained. But the last release was a few months ago (Feb-2023).
Another very important consideration is that switching from returning data.tables to tibbles is almost certainly going to break your user's code. That's a very significant breaking change that shouldn't be taken lightly.
Thanks for your thoughtful considerations. I'm aware that this change might break users code. That's why I would like to make the change now, since I'm updating the package to the new API version, where breaking changes will be inevitable.
But I'll rethink the issue.
My main point was that "actively developed" and "actively maintained" are different. For example, most of my packages are feature-complete, so they're not actively developed. They don't get new features often (or at all). That doesn't mean people should avoid them. They're still actively maintained. I still fix bugs and intend to keep them on CRAN.
Good point about updating the API version breaking stuff too. After I made my comment, I noticed you don't have reverse dependencies on CRAN, and your lifecycle is 'experimental'. Those things make breaking changes less of an issue for you right now.
I just had another thought: it's good practice to bump the major version when there are breaking changes.
Anyway, these are just my thoughts. I'm not going to criticize whatever decision you make. It's your package, after all. :wink:
EDIT: I reached out to @MichaelChirico and he pointed me to https://github.com/Rdatatable/data.table/pull/5629. So there's some work being done to get data.table development moving again.
All valid points! Maybe one more pro argument for data.table (that you all know probably): It's faster. As far as I know its the fastest dataframe package anywhere. Even more when you look at the reliability vs speed tradeoff! (e.g. some python packages like polars or pydatatable are fast too, but just fail at too many tasks). Dt however does what one would expect. For my projects in which I use your great package, I would transfer the tibble back to data.table if you would change the package to tibble. The reason being that the time costs of not using dt just accumulates later in the workflow of a larger project. I can imagine this is the case for many projects in the finance domain, where time is of essence. As I understand, dt is very popular there for that reason. So while users certainly can deal with the change, its just going to feel a bit annoying for some. Probably. I hope, this helps.
@matthiasgomolka Another option you could move towards is to add a new function argument, say, return_as
supporting a list of supported return formats. We did this in a few Rblpapi
functions years ago because even among the three of us (at the time) looking after the package there wasn't "one" preference. If memory serves it wasn't an entirely new idea then either but I no longer recall where we may have gotten it from. I do the same now in package tiledb
, and in both also support local configuration option so that for me the return defaults to data.table
but the overall default may another value. Just a thought to make the switch to tibble
less disruptiive.
Since data.table
got a new update recently and due to your thoughtful remarks (thanks a lot!) I've decided to stick to data.table
. Closing this now.
Since
data.table
does not seem to be under active development any more, we should make the switch to returningtibbles
instead ofdata.tables
.