mauro3 / Parameters.jl

Types with default field values, keyword constructors and (un-)pack macros
Other
420 stars 31 forks source link

Request - flags to allow additional arguments to be ignored and values supplied as "missing" to be overridden. #153

Open Lincoln-Hannah opened 1 year ago

Lincoln-Hannah commented 1 year ago

The objective of this request is to make it very easy to use AsTable(:) to convert a DataFrame to an array of structures. ( AsTable(:) passes a DataFrame row as a named tuple. ) Example

@with_kw mystruct   (allow_additional_args  = true,   overwrite_missing = true)
    X
    Y  = 2X
end

@chain begin
         DataFrame( X=[1,2], Y=[1,missing],  Z=[1,2] )

          @rtransform   :mystruct    = mystruct(   AsTable(:)...   ) 
end

#   produces
#   mystruct(1, 1)
#   mystruct(2, 4)

allow_additional_args = true, means column Z is ignored (rather than causing an error).

overwrite_missing means when column Y = missing, the default of 2X is used, as it would be if field Y were not supplied e..g mystruct(X=1)

More generally When the struct has 10 or 20 fields and the DataFrame has 50. (or when the struct is created from a larger struct with a super-set of fields), its nice to not have to re-state the fieldnames.

mauro3 commented 1 year ago

Thanks for the input! First off, DataFrames are a mystery to me, so you'll have to give examples without them.

In principle, I can see that allow_additional_args could be useful. I wonder whether it should be a specific constructor as opposed to a property of the type itself. Something like:

julia> @with_kw struct Mystruct
       a
       b
       end

julia> Mystruct(Parameters.ignore_additional_kwargs, a=1, b=2, c=3)

? A bit uglier but maybe clearer?

Not sure about the other one. Seems too specific?

Lincoln-Hannah commented 1 year ago

That's a nice syntax. Adding an ignore_missing parameter to your example:

@with_kw struct MyStruct
    a
    b=2a
end

X = (a=1,b=missing, c=1)

MyStruct(   Parameters.ignore_additional_kwargs,   Parameters.ignore_missing ;  X...  )

gives X(a=1,b=2)

I encourage you to look at DataFrames with Chain and DataFramesMeta. It's much more elegant than SQL or Pandas - you can select, filter, aggregate and pivot within a single Chain block, and Bogumił and the others are always helpful.

Maybe there's a better way to work ? but my usual process is:

  Read data from SQL Databases, CSV or XML files into DataFrames
  Manipulate with Chain and DataFramesMeta macros. 
  Convert to array of structs
  Do calculations with broadcasted functions
  Use DataFrames to aggregate and summarize output.

While structs and functions are the best for calculations, Database tables are a standard for storage. It would be nice to transition between the two in one short line of code. This is the motivation for this request and the DataFramesMeta request linked to above.

The ignore_missing parameter is for when a DataBase column is missing or NULL for some rows and you want to supply a default (e.g. b=2a) in the struct. For me this is common. Maybe its just me:)

mauro3 commented 1 year ago

lol, I don't know SQL or Pandas either ;-) But, yes, I probably should learn them...

Lincoln-Hannah commented 1 year ago

SQL yes cos its easy and so much data is stored in SQL databases. Don't bother with Pandas unless you have to. IMO Julia's DataFrames ecosystem is the best in any language.

mauro3 commented 1 year ago

The first one can just be done with a utility function:

using Parameters
@with_kw struct AA
    a = 0
    b = 0
    c = 0
end

function construct_it(T, tup::D) where D
    tf = fieldnames(D)
    fn = fieldnames(T)
    # remove all fields from tuple tup which are not in T
    out = (;)
    for f in fn
        if f in tf && tup[f]!==missing
            out = (out..., f=>tup[f])
        end
    end
    T(;out...)
end

construct_it(AA, (a=3, b=7, u=8))

(ok, the function needs a better name). The missing handling could also be incorporated into that function.

I think that be the best approach.

Edit: added missing-check/feature

Lincoln-Hannah commented 1 year ago

I like that :)

Lincoln-Hannah commented 1 year ago

will you put this in?