tokio-rs / prost

PROST! a Protocol Buffers implementation for the Rust Language
Apache License 2.0
3.82k stars 498 forks source link

Decoding from file #138

Closed ChristopherRabotin closed 5 years ago

ChristopherRabotin commented 5 years ago

I have a large 529 MiB protobuf generated from Python and stored in a file. I can read it correctly from Python (all Python protobuf operations use Google's library). However, I cannot read it from prost!.

The serialized protobuf does not have data set for all the fields, i.e. some fields were not set. It uses the proto3 syntax for which all fields are optional.

I may be attempting to decode things incorrectly, as I haven't found any decoding example. I can attach the .proto file and the generated code from prost_derive if that helps. And below is the relevant decoding attempt.

    let infn = matches.value_of("input").unwrap();

    let mut inbuf = Vec::new();

    File::open(infn)
        .expect(&format!("could not open EXB file {}", infn))
        .read(&mut inbuf)
        .expect("something went wrong reading the file");

    let outfn = matches.value_of("output").unwrap();
    let step_size = f64::from_str(matches.value_of("step").unwrap())
        .expect("could not convert step size to floating point value");

    if step_size <= 0.0 {
        panic!("Step size is not greater than zero");
    }

    println!("Using output CSV: {}", outfn);
    println!("Using step size: {} seconds", step_size);

    let s = prost::decode_length_delimiter(inbuf.into_buf());
    println!("{:?}", s);

Here is the error:

Err(DecodeError { description: "invalid varint", stack: [] })
danburkert commented 5 years ago

Decode_length_delimiter is almost certainly not what you want. Try MyMessageType.decode.

ChristopherRabotin commented 5 years ago

You're right, thanks.

Using the following does not have an error, but the data still isn't loaded correctly: Python says there are 32 "ephemeris" messages in there, but prost says zero. Any idea how I could debug this?

EphemerisContainer { meta: None, mod_julian_offset: 0.0, ephemerides: [] }
ChristopherRabotin commented 5 years ago

For reference, here is the protobuf definition:

/*
 * The Ephemeris eXchange Binary (`.exb`) file definition.
 *
 * This file allows storing ephemeris data for celestial bodies and spacecraft, along with optional
 * named parameters of said object. The ephemeris data may be provided as a set of "Truth" data
 * for known position, optional velocity and optional covariance, or either as a set of equally
 * timed interpolation coefficients or a set of unequally timed interpolation coefficients. These
 * coefficients include position and optional velocity. The units for the position and velocity may
 * also be independently specified. The type of interpolation used and its degree is also specified.
 * Each ephemeris object is identified by a number (signed integer) and/or a name (as a string).
 */
syntax = "proto3";

package exb;

enum UnitTime {
  days = 0;
  seconds = 1;
  CustomTimeUnit = 2;
}

enum UnitDistance {
  AU = 0;
  km = 1;
  m = 2;
  CustomDistanceUnit = 3;
}

enum UnitVelocity {
  km_s = 0;
  m_s = 1;
  CustomVelocityUnit = 2;
}

message Identifier {
  // All objects in this ephemeris are given an Identifier.
  sint32 number = 1; // May be used to store the NAIF_ID
  string name = 2; // May be used to store the SPACEWARN identifier
}

message Parameter {
  /* A parameter value, may be used to specify celestial object properties (e.g. GM), or spacecraft
   * properties (e.g. mass). */
  double value = 1;
  string unit = 2;
}

message Coefficients {
  repeated double x = 1;
  repeated double y = 2;
  repeated double z = 3;
}

/* EqualStepStates provides an O(1) access to all of the states.
 * To access state of time T, get the index as
 * floor((t_in_mod_julian - start_mod_julian)/window_duration) .
 * This state's coefficients can be cached for quicker continuous computation.
 * Note that we store the position and velocity outside of a message for smaller serialized
 * structure (two contiguous lists of structures).
 */
message EqualStepStates {
  // Fixed window duration for all of the states
  double window_duration = 1;
  // Unit of the window duration
  UnitTime window_duration_unit = 2;
  // All position coefficients for this time offset.
  repeated Coefficients position = 3;
  /* All velocity coefficients for this time offset. Optional, but if used, it **must** be of the
   * same length as the list of position coefficients. */
  repeated Coefficients velocity = 4;
}

/* VarWindowStates provides an O(log(n)) + O(1) access to all of the states. The O(log(n))
 * corresponds to the binary search in the index, which then leads to an O(1) access. */
message VarWindowStates {
  message State {
    // Relative time in seconds compared to the indexed time.
    float time_offset = 1;
    // Duration in seconds for which these states are valid.
    float window_duration = 2;
    // All position coefficients for this time offset.
    Coefficients position = 3;
    // All velocity coefficients for this time offset. (optional)
    Coefficients velocity = 4;
  }
  /* A pre-sorted list of all of the times (in seconds) available in the map of states.
   * These time entries are seconds paste the start_mod_julian dates (which is in days).
   * Perform a binary search in this index to retrieve time key for the desired time.
   * In other words, search for the closest time to the desired time, retrive the State
   * for this time, build the interpolation functions, and finally apply these at the desired time.
   * NOTE: Limitations of protobufs require this index to be an integer.
   * NOTE: For better platform support, these reference times are limited to 32 bits. */
  repeated uint32 time_index = 1;
  // A map associating each time (in seconds) from the index with a State.
  map<uint32, State> states = 2;
}

message Truth {
  message Vector {
    double x = 1;
    double y = 2;
    double z = 3;
  }

  message Covariance {
    // The Covariance of the object is based on the CCSDS OEM Data format.
    double cx_x = 1; // Covariance matrix [1,1]
    double cy_x = 2; // Covariance matrix [2,1]
    double cy_y = 3; // Covariance matrix [2,2]
    double cz_x = 4; // Covariance matrix [3,1]
    double cz_y = 5; // Covariance matrix [3,2]
    double cz_z = 6; // Covariance matrix [3,3]
    double cx_dot_x = 7; // Covariance matrix [4,1]
    double cx_dot_y = 8; // Covariance matrix [4,2]
    double cx_dot_z = 9; // Covariance matrix [4,3]
    double cx_dot_x_dot = 10; // Covariance matrix [4,4]
    double cy_dot_x = 11; // Covariance matrix [5,1]
    double cy_dot_y = 12; // Covariance matrix [5,2]
    double cy_dot_z = 13; // Covariance matrix [5,3]
    double cy_dot_x_dot = 14; // Covariance matrix [5,4]
    double cy_dot_y_dot = 15; // Covariance matrix [5,5]
    double cz_dot_x = 16; // Covariance matrix [6,1]
    double cz_dot_y = 17; // Covariance matrix [6,2]
    double cz_dot_z = 18; // Covariance matrix [6,3]
    double cz_dot_x_dot = 19; // Covariance matrix [6,4]
    double cz_dot_y_dot = 20; // Covariance matrix [6,5]
    double cz_dot_z_dot = 21; // Covariance matrix [6,6]
  }

  double mod_julian = 1;
  Vector position = 2;
  Vector velocity = 3;
  Covariance covariance = 4;
  /* The covariance exponent specifies an optional exponent for all of the components of the
   * covariance. This enables storing high precision covariance while not losing precision of
   * floating point values.
   */
  double covariance_exponent = 5;
}

message Interpolation {
  enum IPType {
    // Supported interpolations.
    Chebyshev = 0;
    Hermite = 1;
    Polynomial = 2;
    Lagrange = 3;
    // NOTE: Requires additional communication between provider and implementer.
    CustomInterpolationType = 4;
  }

  // Type of interpolation used
  IPType itype = 1;
  /* Degree of the interpolation used for computing the position (e.g. Piecewise Linear would have
   * a degree 1, but a Hermite interpolation would usually have 2*nval - 1 where nval corresponds
   * to the number of states used to compute the interpolation coefficients). */
  uint32 position_degree = 2;
  /* Degree of the interpolation used for computing the velocity. Only used if the interpolation
   * includes the velocity data. */
  uint32 velocity_degree = 3;
  // A start time of all of the states.
  double start_mod_julian = 4;
  oneof state_data {
    EqualStepStates equal_states = 5;
    VarWindowStates varwindow_states = 6;
  }
}

message Ephemeris {
  // Unique identifier of the object
  Identifier id = 1;
  string ref_frame = 2;
  /* An optional list of known states. This may be used by an implementer to verify their
   * computation, or to transmit future propagated states with an optional covariance. */
  repeated Truth known_states = 3;
  Interpolation interpolator = 4;
  // An optional map of parameter name to parameter value and unit.
  map<string, Parameter> parameters = 5;
  UnitDistance distance_unit = 6;
  UnitVelocity velocity_unit = 7;
}

message Meta {
  message CEDate {
    uint32 year = 1;
    uint32 month = 2;
    uint32 day = 3;
  }

  CEDate date = 1;
  string version = 2;
  string publisher = 3;
  bool proprietary = 4;
  string docs = 5;
  string comments = 6;
  string generator = 7;
}

message EphemerisContainer {
  Meta meta = 1;
  /* Offset of time references `T` wrt to the Modified Julian Date of "1858 Nov 17 zero hours".
   * The computation must be as such: `MJD = start_mod_julian + mod_julian_offset`. Hence, if the
   * reference of the calendar is _after_ the MJD reference, mod_julian_offset is a positive number.
   * For example, if the time references are in Julian Dates, then mod_julian_offset = -2400000.50.
   */
  double mod_julian_offset = 2;
  repeated Ephemeris ephemerides = 3;
}
ChristopherRabotin commented 5 years ago

My mistake: I wasn't reading the file.

I was defining the input buffer as Vec::new() and I was using .read instead of .read_to_end when reading the file. Hence, the buffer was of length zero.

I guess the only lesson learned here to check that the buffer isn't empty prior to attempting to read it.