Open julesjacobsen opened 3 years ago
@julesjacobsen not sure this is absolutely needed? If we stick with JSON that will mean we encourage people to use JSON as the primary format?
@julesjacobsen see new class DefaultPhenopacketIngestor. We could add some functions to this class such as public fromYamlFile(...) and DefaultPhenopacketIngestor(Message message). Thoughts?
@ielis is this issue closable? I think this is supported for some operations
In principle yes. Each command that reads or writes a phenopacket accepts/produces phenopacket, family, or cohort in any of these formats.
The commands have the -f | --format
option for the input data. The convert
command has the --output-format
option for choosing the, well, output format.
We do not have a command solely for the format conversion (something similar to cat sample.bam | samtools view -S > file.sam
). Implementing the command is a no-brainer, since we already have all the nuts and bolts. I just need some use case.
@julesjacobsen not sure this is absolutely needed? If we stick with JSON that will mean we encourage people to use JSON as the primary format?
Just revisiting this - is JSON the primary format for phenopackets? Is this written somewhere else?
I am trying to do some dataset sharing (ala EGA) - and was considering placing a phenopacket alongside each individuals' genomic artifacts. But I was assuming I needed that to be a protobuf file with some sort of known file suffix like pxf
to be a primary format Phenopacket.
e.g.
ABC.bam
ABC.vcf
ABC.pxf
And so to that end - I was going to store some v2 JSON or YAML phenopackets for ease of editing - and then convert them over to protobuf using the CLI tool (so this is my +1 for the general feature of being able to convert between formats with just the CLI tool - which is currently not possible - convert requires the input to be v1 format)
But if JSON is the primary way we think phenopackets are to be exchanged in the wild - then I can skip using protobuf entirely.
Is there some suggested file naming conventions to let people know it is a phenopacket (in JSON)?
I should add that I am starting via hand crafting some examples for a demonstration of how this would all work - hence the hand editing of JSON or YAML.
Obviously for a real system I would be translating from some clinical source like an EHR or Redcap or something and so I guess I would do that using the Java library and output easily whatever format choice I wanted.
I think the broader thought is still there - if I have unlimited choice here - what is the primary "phenopacket" file format and how should I name them to make this clear?
Hi Andrew, there could be a lossless conversion from protobuf (binary), JSON, YAML, XML, SQL ... so there really isn't a primary format. My guess is that almost everybody would prefer JSON because of the tooling for JSON.
In which case - having an tool that seamlessly converts between the formats might be useful (if I get a batch of phenopackets in protobuf but would prefer them in JSON) - I can just run the CLI tool to convert.. (rather than dusting off my java and writing a small snippet using the library to do the same)
Currently the converter only handles JSON. Might be an idea to offer conversion of other formats too.