sl-solution / InMemoryDatasets.jl

Multithreaded package for working with tabular data in Julia
Other
127 stars 18 forks source link

what is the type of ds.A? #61

Open sprmnt21 opened 2 years ago

sprmnt21 commented 2 years ago
ds = Dataset(A = ["a", "b","a", "b"],B=[1,2,3,4])

julia> ds.A==ds[:,:A]
true

julia> typeof(ds.A)
DatasetColumn{Dataset, Vector{Union{Missing, String}}}

julia> typeof(ds[:,:A])
Vector{Union{Missing, String}} (alias for Array{Union{Missing, String}, 1})

julia> ds.A
4-element Vector{Union{Missing, String}}:
 "a"
 "b"
 "a"
 "b"

I had tried to make the concatenation between what I thought were two vectors [ds.A; ds.A]

ulia> [ds.A ; ds.A]
2-element Vector{DatasetColumn{Dataset, Vector{Union{Missing, String}}}}:
 DatasetColumn{Dataset, Vector{Union{Missing, String}}}(1, 4×3 Dataset
 Row │ A         B         C        
     │ identity  identity  identity
     │ String?   String?   String?
─────┼──────────────────────────────
   1 │ a         no        low
   2 │ b         yes       low
   3 │ a         no        hi
   4 │ b         no        hi, Union{Missing, String}["a", "b", "a", "b"])
 DatasetColumn{Dataset, Vector{Union{Missing, String}}}(1, 4×3 Dataset
 Row │ A         B         C        
     │ identity  identity  identity
     │ String?   String?   String?
─────┼──────────────────────────────
   1 │ a         no        low
   2 │ b         yes       low
   3 │ a         no        hi
   4 │ b         no        hi, Union{Missing, String}["a", "b", "a", "b"])
sl-solution commented 2 years ago

It is DatasetColumn, a customised structure which wrap a column of a data set. It is there because we want to track any changes to a data set column. Any change of a value of a column can change the following attributes of a data set:

Thus, an abstract vector cannot be used for this purpose, and a customised type is used instead. Generally, we recommend ds[:, :A] for extracting columns and/or provided APIs to manipulate columns.

However, if you think a method must be defined for DatasetColumn, you are welcome to open a PR for it. The right location to add such methods is src/abstractdataset/dscol.jl.

Just a side note: for repeating rows you can use repeat! or repeat, and use append! to append data sets.