Closed raylutz closed 4 months ago
Another option will be to provide cols and dtypes as property attributes, and use similar code to provide for indexing, including slicing, using lists of colnames, etc. Near term solution can be to provide methods that will provide the same flexibility, without resolving the [ ] syntax.
This issue as been set aside for now.
It will be very convenient to provide access to the colnames and dtypes (and other similar metadata that is linked to the columns) by using the indexing syntax. This can be done by using special names to avoid row-key collision and treating these like special rows of data.
The special names can avoid collisions with the row keys.
Assigning column names (in addition to when daf instance is created)
Then reading colnames:
Then for dtypes:
For example, if say three columns are not numeric data, and the rest are integers:
Given a dtypes_dict, initialize the columns and dtypes:
Get the current colnames as a list:
We need to know what colnames have int datatypes
Can build a reverse-lookup structure
dtypes_to_cols = utils.invert_da_to_dola(my_daf[$dtypes])
This function is similar to "value_counts()" operation but goes one step further and provides the keys where the values are found.
(I've need this frequently in data analysis for metadata type data esp. when that data needs to be correlated or when at least a few examples of when it occurs can be provided.)
For example, let's say we need the column names of a specific dtype:
For a given daf, it might be worth creating the reverse-lookup dola structure and saving in cache, or simply searching each time For now, we can just do the lookup each time, as the dtypes data is not too voluminous.