Open pnrobinson opened 2 months ago
The problem was actually the following. In Protocol Buffers (Protobuf), fields are only included in the serialized output if they are explicitly set to a non-default value. This is done to minimize the size of the serialized message. I tried to set the reference range to 0-30 pg/ml The JSON output then only includes the high=30 value, but does not include the 0 value, because this is equal to the default value. However, if we include a reference range, then both low and high values are required: https://phenopacket-schema.readthedocs.io/en/latest/reference-range.html @ielis @julesjacobsen It would be possible to get around this by using an ugly trick in protobuf, e.g.,
oneof my_float_wrapper {
float my_float = 1;
}
or by
google.protobuf.FloatValue my_float = 1;
(instead of just float
)
but it is too late to change this in the protobuf code. Therefore, I think this is a bug that is unfixable in the current implementation. Thoughts?
@pnrobinson This sounds like a JSON output issue rather than a protobuf issue. There should be a way of making the JSON output include the defaults, but then the issue is that the JSON will include all the default zero-values for the entire schema. Maybe there is a way of enabling this just for the ReferenceRange?
@julesjacobsen
I think you are right --
see this documentation: we need to set including_default_value_fields=True
in our code.
@ielis @iimpulse -- I think this is what we want in general? is there any reason we shouldn't set this argument to true everytime we write to file? (Other than we will get slightly larger files?)
google.protobuf.json_format.MessageToJson(message,
including_default_value_fields=False,
preserving_proto_field_name=False,
indent=2,
sort_keys=False,
use_integers_for_enums=False,
descriptor_pool=None,
float_precision=None,
ensure_ascii=True)
I am seeing errors such as this:
However, the reference range is not required: https://phenopacket-schema.readthedocs.io/en/latest/quantity.html#rstquantity
In this particular case, the reference range depends on time of day and other factors that are not available in the input dataset and that in general may not even be available to a lab, and so it is not a good idea to enforce this requirement.