Open cboettig opened 4 years ago
Thanks for raising this. I'm really not sure and wanna chew on it.
Re. set_
: When I first saw you use set_*
years? ago, I thought it would take my EML and return modified EML. I think that's semantically nearest to set
. This changes the behavior of other set_
functions though. It allows code like eml %>% set_title("Foo") %>% set_abstract("Bar") -> eml
. Just my 2c there.
Also: Another way to make set_packageId()
work would be my_eml$packageId <- set_packageId()
, which is a bit odd but matches the current implicit interface on the set_*
functions.
I think @jeanetteclark and @dmullen17 could offer some good insight here too.
I agree that having set_packageId
return a full eml
object is a little awkward. However, since the very beginning of my use of the EML package I have wanted a top level helper to create a minimally valid EML document without fussing around with list
(this was an even worse problem in the S4 version of EML!). Because of that, I like the set_eml
option outlined above the best.
Thanks @amoeba ! I believe the other set_*
methods are identical to the methods in the original S4 version of package (pre emld
), where they were essentially object constructors for every complexType object (since we had class definitions for each of those). Since that meant things like attributeList
were valid objects with there own methods, so I think in S4 methods world that made some sense (though strictly following convention the constructor should have been just the object name, attributeList()
, etc, and obviously we never wrote constructors for all the 100+ complexTypes). In swapping in emld
in the backend we tried to change as little as possible int the original package API (which was probably a bit silly since almost any real-world use code needed to use some S4 accessors and thus broke anyway, as you all know all too well!) anyway, old history.
I agree with you :100: though that the pipe structure you outline seems much more natural (and echoes a more functional programming style). I am very tempted to switch all the other methods to that. Downsides are that this would obviously be a breaking change, so I'm not sure it's worth it.
Also, most complexTypes don't have a set_
method, and are intended to be just built as lists, so it might confuse users to realize that a lot of practical EML couldn't be constructed entirely in a pipe chain at this time.
So I guess I'm still leaning to set_eml()
proposal for now, but still undecided and would love more input.
The backwards compatibility point is a big one and I think its worth avoiding such a breakage change as I describe. I think set_eml()
is a fair compromise at this point.
@amoeba @jeanetteclark Would you be interested in a quick PR defining a set_eml()
method along these lines? I think it should also be added to the example in README, and then write_eml()
should drop the behavior of automatically adding a packageId
(but possibly gain a warning about creating one with set_eml()
if no packageId
exists).
Thanks!
Hey @cboettig I can try to give this a shot next week
Following up on #293, proposal to resolve results in two separate tasks: removing the creation of setting id on
write_eml()
, and adding a separate function toset_packageId()
, and separately, possibly to modify validation error message specifically for thepackageId()
case to provide a nice clear error message on this (perhaps not needed but could ease the transition since this is a breaking change?)I'm a bit unsure on the preferred syntax for
set_packageId()
. In the set_packageId branch I've added a mockup with the syntax like this:All fields have defaults, so you can create a new eml object with package id using:
But I'm surprised just how many interface questions this raises. Most
set_*
methods take only the fields to be added and construct a list fragment, they don't take an argument for the full existingeml
list and return the fulleml
object, but maybe that's okay here since these are top-level attributes?Or it might be better to call this
set_eml()
, creating a top-level constructor and not take an existingeml
list object as an argument, e.g. something that looks like:The function would probably need an assertion to make sure that the
system
is set manually if thepackageId
is set manually too.Thoughts @amoeba @mbjones others?