[Feature Request]: Don't use @datanames anymore

gogonzo commented 3 weeks ago

Feature description

@datanames slot has been introduced only to distinguish temporary/proxy objects from "real" datanames. It was useful in teal where app didn't have to create a filter-panel. Now (after this) teal ignores teal.data::datanames and simply runs ls(x@env) to determine "datanames".

We can keep datanames function but names (implemented in qenv) is more relevant in my opinion.

names.qenv <- function(x) ls(x@env)
names(teal_data(a = 1, .b = 2))
#> [1] "a"

Plan:

introduce names method for qenv and qenv.error
deprecate datanames<- in teal.data
deprecate datanames() in teal.data
replace teal.data::datanames in the modules in favour of names()
- tmc, tmg, teal.gallery, tlg-catalog

Related issues:

gogonzo commented 1 week ago

@pawelru I'd like to continue discussion here:

I have an idea that @datanames shouldn't exist

I'm all for it!

no dataname slot - makes it simpler and reduce the clutter code made by end-user on the preprocessing part

topological sort functionality should live in the teal-data class

One question tough. Can we avoid extending the base class in teal.code and apply extension only in teal.data? I would like to avoid edits in teal.code. It's good as is.

If we decide to use names to read the object names from @env then we don't even need to make generic method because this is base S3 method - in this case I don't see why we should avoid teal.code. It would look like this:

names.qenv <- function(x) ls(x)

names.teal_data <- function(x) {
  unsorted <- names.qenv(x)
  .topological_sort(unsorted, join_keys(x))
}

names unlike datanames is more general name and fits to both classes.

We can consider also other methods names:

ls_env
env_ls
get_ls
list_objects
...
get_env_names

Yes I am aware. It very connected to my old issue https://github.com/insightsengineering/teal/issues/969. I guess we won't solve it here. But I think we should be good on this. I think that we are on the same page that get_datanames() functionality (however its implemented) should exclude datanames starting with dot. That check function you referenced should be called after that filter.

ls by default excludes . so I consider this more natural this this is why ls_env is closer by its function. We can't implement ls.qenv as ls has no S3 generic.
names however is much easier to implement and most of the R class have this method. Problem is that names.environment doesn't exclude . elements.

gogonzo commented 1 week ago

By looking at the NEWS entries of this incoming release it feels like we should address this issue sooner. It doesn't make sense to enhance datanames and then deprecate them. Improvements will be still relevant but datanames() function might be deprecated soon in favour of something else (ls_env or names).

Or maybe we should keep datanames() getter for backwards compatibility and not bother with more precise name like ls_env?

# teal.data 0.6.0.9011

### Enhancements

- `datanames()`
    - if `join_keys` are provided, the `datanames()` are now sorted in topological way (`Kahn` algorithm),
    which means the parent dataset always precedes the child dataset.
    - are extended by the parent dataset name, if one of the child dataset exist in `datanames()` and
    the connection between child-parent is set through `join_keys` and `parent` exist in `teal_data` environment.
    - do not allow to set a dataset name that do not exist in `teal_data` environment.
    - `teal_data` no longer set default `datanames()` based on `join_keys` names - it uses only data names.

gogonzo commented 1 week ago

After short call with @pawelru we initially agreed to the names with logic described in this comment

insightsengineering / teal.data

[Feature Request]: Don't use @datanames anymore #333

Feature description