matthiasgomolka / simfinapi

Makes 'SimFin' data (https://simfin.com/) easily accessible in R.
19 stars 4 forks source link

Switch to tibbles #48

Closed matthiasgomolka closed 8 months ago

matthiasgomolka commented 1 year ago

Since data.table does not seem to be under active development any more, we should make the switch to returning tibbles instead of data.tables.

GitHubGeniusOverlord commented 1 year ago

Incorrect assumption! https://github.com/Rdatatable/data.table/blob/master/NEWS.md Data table is under development.

matthiasgomolka commented 1 year ago

Well, the last real feature update was more than two years ago. That's quite a while, isn't it?

joshuaulrich commented 1 year ago

A package not being actively developed is not a good reason to stop using it. You could argue that it's a good reason to keep using it, because it's less likely to make breaking changes. It would be a good idea to stop using it if it weren't actively maintained. But the last release was a few months ago (Feb-2023).

Another very important consideration is that switching from returning data.tables to tibbles is almost certainly going to break your user's code. That's a very significant breaking change that shouldn't be taken lightly.

matthiasgomolka commented 1 year ago

Thanks for your thoughtful considerations. I'm aware that this change might break users code. That's why I would like to make the change now, since I'm updating the package to the new API version, where breaking changes will be inevitable.

But I'll rethink the issue.

joshuaulrich commented 1 year ago

My main point was that "actively developed" and "actively maintained" are different. For example, most of my packages are feature-complete, so they're not actively developed. They don't get new features often (or at all). That doesn't mean people should avoid them. They're still actively maintained. I still fix bugs and intend to keep them on CRAN.


Good point about updating the API version breaking stuff too. After I made my comment, I noticed you don't have reverse dependencies on CRAN, and your lifecycle is 'experimental'. Those things make breaking changes less of an issue for you right now.

I just had another thought: it's good practice to bump the major version when there are breaking changes.

Anyway, these are just my thoughts. I'm not going to criticize whatever decision you make. It's your package, after all. :wink:


EDIT: I reached out to @MichaelChirico and he pointed me to https://github.com/Rdatatable/data.table/pull/5629. So there's some work being done to get data.table development moving again.

GitHubGeniusOverlord commented 1 year ago

All valid points! Maybe one more pro argument for data.table (that you all know probably): It's faster. As far as I know its the fastest dataframe package anywhere. Even more when you look at the reliability vs speed tradeoff! (e.g. some python packages like polars or pydatatable are fast too, but just fail at too many tasks). Dt however does what one would expect. For my projects in which I use your great package, I would transfer the tibble back to data.table if you would change the package to tibble. The reason being that the time costs of not using dt just accumulates later in the workflow of a larger project. I can imagine this is the case for many projects in the finance domain, where time is of essence. As I understand, dt is very popular there for that reason. So while users certainly can deal with the change, its just going to feel a bit annoying for some. Probably. I hope, this helps.

eddelbuettel commented 1 year ago

@matthiasgomolka Another option you could move towards is to add a new function argument, say, return_as supporting a list of supported return formats. We did this in a few Rblpapi functions years ago because even among the three of us (at the time) looking after the package there wasn't "one" preference. If memory serves it wasn't an entirely new idea then either but I no longer recall where we may have gotten it from. I do the same now in package tiledb, and in both also support local configuration option so that for me the return defaults to data.table but the overall default may another value. Just a thought to make the switch to tibble less disruptiive.

matthiasgomolka commented 8 months ago

Since data.table got a new update recently and due to your thoughtful remarks (thanks a lot!) I've decided to stick to data.table. Closing this now.