Open blankenberg opened 7 years ago
@joey711 @jnpaulson Any thoughts, comments, or a potential fix available? Is this project still alive?
Definitely still alive. @joey711 any thoughts?
Yes, we should fix this. While I'm pretty sure the R-write-read round trip would work fine, we do want the result to be considered valid by external utilities like those in the python library.
This should be pretty easy to fix by coercing the slot for each of those offending entries to be character vectors rather than character vectors encapsulated by an inner list. Something like the following should work for length-1 cases:
myBiomList$format <- as.character(myBiomList$format[[1]][1])
Of course we want to abstract this to a function that enforces this type coercion for us, so something like
biom_char_not_list = function(x){
as.character(x[[1]])
}
and then the first example becomes
myBiomList$format <- biom_char_not_list(myBiomList$format)[1]
We can leave the length expectation inside or outside the function. Depends on whether we need to do this for any entries that can/should have length longer than 1.
Matrix and table entries tend to have type specs that we've probably already addressed, but we should double-check while we're at it.
The above solution should be implemented in a validation function that is called by write_biom
, so that other functions can re-use this, and that type enforcement is consistent.
I'd rather not find out way after the fact that minor format specs are causing output to fail read-validation in other utilities. I also don't want to re-implement all of the tests that are already written in the python library, so we should probably include in this package a testing script that uses the python read commands to test this packages' write output.
I'll link this issue on a new testing "feature" request
"format", "format_url", "type":, "generated_by", "date", "id"s within rows/columns, "matrix_type", etc are all wrapped in lists (
[]
), whereas the specification states these should bestring
s: http://biom-format.org/documentation/format_versions/biom-1.0.htmlFiles created by the
write_biom
method cannot be parsed by the official biom-format command-line tools, resulting in the error ofTypeError: <filename> does not appear to be a BIOM file!