Enable JSON <-> YAML, JSON <-> binary conversion?

julesjacobsen commented 3 years ago

Currently the converter only handles JSON. Might be an idea to offer conversion of other formats too.

pnrobinson commented 2 years ago

@julesjacobsen not sure this is absolutely needed? If we stick with JSON that will mean we encourage people to use JSON as the primary format?

pnrobinson commented 2 years ago

@julesjacobsen see new class DefaultPhenopacketIngestor. We could add some functions to this class such as public fromYamlFile(...) and DefaultPhenopacketIngestor(Message message). Thoughts?

pnrobinson commented 2 years ago

@ielis is this issue closable? I think this is supported for some operations

ielis commented 1 year ago

In principle yes. Each command that reads or writes a phenopacket accepts/produces phenopacket, family, or cohort in any of these formats. The commands have the -f | --format option for the input data. The convert command has the --output-format option for choosing the, well, output format.

We do not have a command solely for the format conversion (something similar to cat sample.bam | samtools view -S > file.sam). Implementing the command is a no-brainer, since we already have all the nuts and bolts. I just need some use case.

andrewpatto commented 1 year ago

@julesjacobsen not sure this is absolutely needed? If we stick with JSON that will mean we encourage people to use JSON as the primary format?

Just revisiting this - is JSON the primary format for phenopackets? Is this written somewhere else?

I am trying to do some dataset sharing (ala EGA) - and was considering placing a phenopacket alongside each individuals' genomic artifacts. But I was assuming I needed that to be a protobuf file with some sort of known file suffix like pxf to be a primary format Phenopacket.

e.g.

ABC.bam
ABC.vcf
ABC.pxf

And so to that end - I was going to store some v2 JSON or YAML phenopackets for ease of editing - and then convert them over to protobuf using the CLI tool (so this is my +1 for the general feature of being able to convert between formats with just the CLI tool - which is currently not possible - convert requires the input to be v1 format)

But if JSON is the primary way we think phenopackets are to be exchanged in the wild - then I can skip using protobuf entirely.

Is there some suggested file naming conventions to let people know it is a phenopacket (in JSON)?

andrewpatto commented 1 year ago

I should add that I am starting via hand crafting some examples for a demonstration of how this would all work - hence the hand editing of JSON or YAML.

Obviously for a real system I would be translating from some clinical source like an EHR or Redcap or something and so I guess I would do that using the Java library and output easily whatever format choice I wanted.

I think the broader thought is still there - if I have unlimited choice here - what is the primary "phenopacket" file format and how should I name them to make this clear?

pnrobinson commented 1 year ago

Hi Andrew, there could be a lossless conversion from protobuf (binary), JSON, YAML, XML, SQL ... so there really isn't a primary format. My guess is that almost everybody would prefer JSON because of the tooling for JSON.

andrewpatto commented 1 year ago

In which case - having an tool that seamlessly converts between the formats might be useful (if I get a batch of phenopackets in protobuf but would prefer them in JSON) - I can just run the CLI tool to convert.. (rather than dusting off my java and writing a small snippet using the library to do the same)

phenopackets / phenopacket-tools

Enable JSON <-> YAML, JSON <-> binary conversion? #16