phenopackets / phenopacket-tools

An app and library for building, conversion, and validation of GA4GH Phenopackets.
http://phenopackets.org/phenopacket-tools/stable/
GNU General Public License v3.0
12 stars 5 forks source link

Reference range not required #197

Open pnrobinson opened 2 weeks ago

pnrobinson commented 2 weeks ago

I am seeing errors such as this:

PMID_30968594_individual_14.json,ERROR,BaseValidator,required,'measurements[4].value.quantity.referenceRange.low' is missing but it is required

However, the reference range is not required: https://phenopacket-schema.readthedocs.io/en/latest/quantity.html#rstquantity

In this particular case, the reference range depends on time of day and other factors that are not available in the input dataset and that in general may not even be available to a lab, and so it is not a good idea to enforce this requirement.

pnrobinson commented 2 weeks ago

The problem was actually the following. In Protocol Buffers (Protobuf), fields are only included in the serialized output if they are explicitly set to a non-default value. This is done to minimize the size of the serialized message. I tried to set the reference range to 0-30 pg/ml The JSON output then only includes the high=30 value, but does not include the 0 value, because this is equal to the default value. However, if we include a reference range, then both low and high values are required: https://phenopacket-schema.readthedocs.io/en/latest/reference-range.html @ielis @julesjacobsen It would be possible to get around this by using an ugly trick in protobuf, e.g.,

oneof my_float_wrapper {
        float my_float = 1;
    }

or by

 google.protobuf.FloatValue my_float = 1;

(instead of just float) but it is too late to change this in the protobuf code. Therefore, I think this is a bug that is unfixable in the current implementation. Thoughts?

julesjacobsen commented 1 week ago

@pnrobinson This sounds like a JSON output issue rather than a protobuf issue. There should be a way of making the JSON output include the defaults, but then the issue is that the JSON will include all the default zero-values for the entire schema. Maybe there is a way of enabling this just for the ReferenceRange?

pnrobinson commented 1 week ago

@julesjacobsen I think you are right -- see this documentation: we need to set including_default_value_fields=Truein our code. @ielis @iimpulse -- I think this is what we want in general? is there any reason we shouldn't set this argument to true everytime we write to file? (Other than we will get slightly larger files?)

google.protobuf.json_format.MessageToJson(message,
 including_default_value_fields=False, 
preserving_proto_field_name=False, 
indent=2, 
sort_keys=False, 
use_integers_for_enums=False, 
descriptor_pool=None, 
float_precision=None, 
ensure_ascii=True)