insightsengineering / teal.data

Data model for teal applications
https://insightsengineering.github.io/teal.data/
Other
8 stars 7 forks source link

[Feature Request]: Don't use @datanames anymore #333

Open gogonzo opened 3 weeks ago

gogonzo commented 3 weeks ago

Feature description

@datanames slot has been introduced only to distinguish temporary/proxy objects from "real" datanames. It was useful in teal where app didn't have to create a filter-panel. Now (after this) teal ignores teal.data::datanames and simply runs ls(x@env) to determine "datanames".

We can keep datanames function but names (implemented in qenv) is more relevant in my opinion.

names.qenv <- function(x) ls(x@env)
names(teal_data(a = 1, .b = 2))
#> [1] "a"

Plan:

Related issues:

gogonzo commented 1 week ago

@pawelru I'd like to continue discussion here:

I have an idea that @datanames shouldn't exist

I'm all for it!

  • no dataname slot - makes it simpler and reduce the clutter code made by end-user on the preprocessing part
  • topological sort functionality should live in the teal-data class

One question tough. Can we avoid extending the base class in teal.code and apply extension only in teal.data? I would like to avoid edits in teal.code. It's good as is.

If we decide to use names to read the object names from @env then we don't even need to make generic method because this is base S3 method - in this case I don't see why we should avoid teal.code. It would look like this:

names.qenv <- function(x) ls(x)
names.teal_data <- function(x) {
  unsorted <- names.qenv(x)
  .topological_sort(unsorted, join_keys(x))
}

names unlike datanames is more general name and fits to both classes.


We can consider also other methods names:

Yes I am aware. It very connected to my old issue https://github.com/insightsengineering/teal/issues/969. I guess we won't solve it here. But I think we should be good on this. I think that we are on the same page that get_datanames() functionality (however its implemented) should exclude datanames starting with dot. That check function you referenced should be called after that filter.

gogonzo commented 1 week ago

By looking at the NEWS entries of this incoming release it feels like we should address this issue sooner. It doesn't make sense to enhance datanames and then deprecate them. Improvements will be still relevant but datanames() function might be deprecated soon in favour of something else (ls_env or names).

Or maybe we should keep datanames() getter for backwards compatibility and not bother with more precise name like ls_env?

# teal.data 0.6.0.9011

### Enhancements

- `datanames()`
    - if `join_keys` are provided, the `datanames()` are now sorted in topological way (`Kahn` algorithm),
    which means the parent dataset always precedes the child dataset.
    - are extended by the parent dataset name, if one of the child dataset exist in `datanames()` and
    the connection between child-parent is set through `join_keys` and `parent` exist in `teal_data` environment.
    - do not allow to set a dataset name that do not exist in `teal_data` environment.
    - `teal_data` no longer set default `datanames()` based on `join_keys` names - it uses only data names.
gogonzo commented 1 week ago

After short call with @pawelru we initially agreed to the names with logic described in this comment