wstrinz / R2RDF

Converting R Objects to RDF Triples
3 stars 2 forks source link

Handling R 'NA' values #2

Open arrigonialberto86 opened 11 years ago

arrigonialberto86 commented 11 years ago

This code looks fine to me, and I could not spot any 'stylistic' issues with that :) Just a thought regarding NA values treatment. As a bioinformatician, I often notice that missing values ('NA') are a great source of 'messed up calculations', so thank you for commenting on that in the code.
I believe that specifically annotating 'NA' values would probably be a waste of memory, so maybe it could be possible to forget about NA values while generating the n3 conversion. I guess a problem would only arise when trying to re-generate the initial dataset, but in this case you may insert back the 'NA' values, so that the output would still be consistent with the initial R data-frame (or matrix,vector,etc) dimensionality. Moreover, I agree with you that a missing 'resource' should be considered a problem, so I guess probably raising an exception would be appropriate in this case.

pjotrp commented 11 years ago

With QTL mapping NA values are missing genotypes or phenotypes (usually due to technical problems). Based on these NAs, we either drop data from the analysis, or inject 'best guess' values. Having the NA in there makes it easy to find those missing data points or revert to them later, for example when creating figures. So, named NA data serves a purpose. Removing that data makes it harder to recover information.

I agree that for generic RDF mining NA serves little purpose, but what we are doing here is transferring data from R to a D3 viewer. Maybe we'll get rid of NAs later.

wstrinz commented 11 years ago

Thanks for reviewing it Alberto! Like Pjotr said, we talked about the best way to handle missing values in general, and decided to keep them in in some for because of the task we're currently working on. But I agree they are a big source of wasted memory, so it'd be good to make that behavior optional.

I will also add a missing resource error before I close this issue though, I had kind of forgotten about that since we haven't been working with any data where this would be a problem lately, but it really breaks the RDF data model in some cases, so I'll make sure an exception is raised.