Adds simplify option for table returned by get_metadata()

hlapp commented 5 years ago

This won't make much of a difference (except for consolidating the property/rel columns) if there are no nested metadata, but is quite useful for tables that include nested metadata.

If set to TRUE, it might be useful to still consolidate the type-specific LiteralMeta and ResourceMeta ID columns into a single one. This is currently not done. However, IDs for meta elements are rarely present (so usually these columns are all NAs), and the only case something needs to be joined to a metadata element (as opposed to the other way) is when there are nested meta elements.

Includes tests.

cboettig commented 5 years ago

This makes a lot of sense. I know it would break backwards compatibility, but arguably this is more intuitive to most users and should be the default behavior? What do you think @hlapp ?

hlapp commented 5 years ago

So, @cboettig you'd be OK with changing default behavior to one that's different from previous behavior?

I'll also note that the result differs between when nested <meta/> elements were encountered and when they weren't.

However, I think it doesn't take much to get us there regardless of whether there were nested meta's, so let me know if changing default behavior such that columns that were previously there are now by default removed is fine (even if those columns largely had all NA values).

cboettig commented 5 years ago

@hlapp yup, I think we should change the default behavior to go for this simpler structure. I think that will be cleaner for most users.

If I were doing this from scratch, I think it would be better if it just returned a table using the standard triple format (i.e subject, predicate, object, and probably separate columns for datatype and language since no one ones to split object on @ or ^^ manually, matching the "tabular" format you see in https://json-ld.org/playground/). In this way, nested meta nodes just show up with the appropriate blank nodes. A user can always get this 'manually' with get_rdf() (e.g. like in the test), though that requires some optional packages and knowledge to write a trivial sparql query...

hlapp commented 5 years ago

The failure at Circle CI is the following:

Quitting from lines 276-278 (manuscript.Rmd) 
Error in eval(substitute(expr), envir, enclos) : invalid argument type
Calls: <Anonymous> ... <Anonymous> -> mutate_ -> mutate_.tbl_df -> mutate_impl -> .Call

I have to say I can't reproduce that at all, so I'm stumped as to what's going on there.

hlapp commented 5 years ago

@cboettig perhaps this means that the container needs updating? I'm clueless as to why the manuscript building encounters an error on a snippet of code that is routinely tested and passes, and when I can knit the manuscript with no problems.

BTW it might well be worth considering to move the manuscript into a separate repo and tie it to a specific version of RNeXML anyway (which I think we cannot do while keeping it in the same repo?). Even if this error looks entirely spurious to me, the time will come that we want to make a change that's not compatible with the code as written in the manuscript, and I'd have reservations about then updating the manuscript unless we'd have some kind of very obvious way to version it. By "very obvious" I mean that people can hardly miss the fact that the version is not the same as the one used to produce the originally published manuscript.

hlapp commented 5 years ago

Looks like the Circle CI failure are actually due to running out of CPU time allocation for non-premium accounts. See #210 and #211.

hlapp commented 5 years ago

Rebased #212 into this branch so we get codecov results as well.

cboettig commented 5 years ago

Ah, nice work tracking that down about the monthly time limit, I've been meaning to see what was going on in the manuscript build since I didn't expect this to break that.

We could move the manuscript to a separate repo, though there might confuse some who expect that it is here. Since it is built on a separate CI system I'm not sure what is gained by doing that. We can modify the Circle-CI settings to test a specific version of RNeXML rather than the current version. It is already using a locked version of R (3.3.1), see circle config. If we replaced the devtools::install() with install.packages("RNeXML", dep=TRUE), it would install the version of RNeXML (and dependencies) that was current when R 3.3.1 was last current. Of course there shouldn't be much need to have that on CI at all then since it would be running the same code base each time, but it would at least be proof-of-principle on the reproducibility; at least within the specified dockerized setup. Anyway, I digress... looks like this is good to merge!

ropensci / RNeXML

Adds simplify option for table returned by get_metadata() #199