ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

Retaining Metadata in a Modified XML File #141

Closed laurajackson closed 5 years ago

laurajackson commented 8 years ago

I am currently using the RNeXML package to import the following data matrix in XML format, modify it to remove some of the character matrix data, then get the same XML file back keeping all the original metadata from the input file. I am able to get the modified matrix as a .csv file, but am unable to convert this new matrix into an XML that still contains the original data from the input file. I understand that when you convert this file to a data.frame, you loose all the associated metadata from the XML file, is there a function available in the current package that allows me to keep this data?

cboettig commented 8 years ago

Hi @laurajackson ,

Right, you won't be able to use get_characters() for this, since transforming the NeXML structure (which can be nested to arbitrary depth) into a single table (e.g. just rows and columns) will always involve some loss of information. To avoid this, you must stick with a nested data structure such as the nexml object class provided by RNeXML.

It's a unfortunate reality that such nested structures (lists, S4 objects, XML) are harder to work with than tabular data / data.frames, that's the price of the richer & more flexible format. But it is still entirely possible to do what you want, but it won't just involve removing a "column" since you have to work with non-tabular data.

I don't think it makes sense to provide a dedicated function to "drop some data". I don't really understand this use case -- what harm is it to carry around the extra data? NeXML was never meant to be provide smallest possible file sizes. Still, you can modify the child elements of nexmll@characters however you see fit. Exactly how you drop the character will depend on what characters block it appears in, if it is a continuous or discrete character, etc. You will also have to decide if you just intend to drop the character itself or also other metadata that may exist about the character.

cboettig commented 8 years ago

@laurajackson Did you figure out what you need here? As discussed above, retaining metadata shouldn't be a problem if you use R's S4 structure, but a character matrix is a fundamentally less metadata-rich data structure, and any function that coerces XML into that format will be lossy. Does that make sense or am I missing something here?

cboettig commented 5 years ago

closing issue as stale.