Open Lincoln-Hannah opened 1 year ago
Thanks for the input! First off, DataFrames are a mystery to me, so you'll have to give examples without them.
In principle, I can see that allow_additional_args
could be useful. I wonder whether it should be a specific constructor as opposed to a property of the type itself. Something like:
julia> @with_kw struct Mystruct
a
b
end
julia> Mystruct(Parameters.ignore_additional_kwargs, a=1, b=2, c=3)
? A bit uglier but maybe clearer?
Not sure about the other one. Seems too specific?
That's a nice syntax. Adding an ignore_missing parameter to your example:
@with_kw struct MyStruct
a
b=2a
end
X = (a=1,b=missing, c=1)
MyStruct( Parameters.ignore_additional_kwargs, Parameters.ignore_missing ; X... )
gives X(a=1,b=2)
I encourage you to look at DataFrames with Chain and DataFramesMeta. It's much more elegant than SQL or Pandas - you can select, filter, aggregate and pivot within a single Chain block, and Bogumił and the others are always helpful.
Maybe there's a better way to work ? but my usual process is:
Read data from SQL Databases, CSV or XML files into DataFrames
Manipulate with Chain and DataFramesMeta macros.
Convert to array of structs
Do calculations with broadcasted functions
Use DataFrames to aggregate and summarize output.
While structs and functions are the best for calculations, Database tables are a standard for storage. It would be nice to transition between the two in one short line of code. This is the motivation for this request and the DataFramesMeta request linked to above.
The ignore_missing parameter is for when a DataBase column is missing or NULL for some rows and you want to supply a default (e.g. b=2a) in the struct. For me this is common. Maybe its just me:)
lol, I don't know SQL or Pandas either ;-) But, yes, I probably should learn them...
SQL yes cos its easy and so much data is stored in SQL databases. Don't bother with Pandas unless you have to. IMO Julia's DataFrames ecosystem is the best in any language.
The first one can just be done with a utility function:
using Parameters
@with_kw struct AA
a = 0
b = 0
c = 0
end
function construct_it(T, tup::D) where D
tf = fieldnames(D)
fn = fieldnames(T)
# remove all fields from tuple tup which are not in T
out = (;)
for f in fn
if f in tf && tup[f]!==missing
out = (out..., f=>tup[f])
end
end
T(;out...)
end
construct_it(AA, (a=3, b=7, u=8))
(ok, the function needs a better name). The missing
handling could also be incorporated into that function.
I think that be the best approach.
Edit: added missing-check/feature
I like that :)
will you put this in?
The objective of this request is to make it very easy to use AsTable(:) to convert a DataFrame to an array of structures. ( AsTable(:) passes a DataFrame row as a named tuple. ) Example
allow_additional_args = true
, means column Z is ignored (rather than causing an error).overwrite_missing
means when column Y =missing
, the default of2X
is used, as it would be if field Y were not supplied e..gmystruct(X=1)
More generally When the struct has 10 or 20 fields and the DataFrame has 50. (or when the struct is created from a larger struct with a super-set of fields), its nice to not have to re-state the fieldnames.