Confused about the Observation Encoder

Hi，many thanks for your excellent work, Mile.

However, I'm very confused about the Observation Encoder:

In your paper, you described that the observation embedding x_t is the concatenation of the image feature (after pooling to BEV), route map feature and speed feature: x_t = [x_t' , r_t, m_t].

That's to say, the order is : pooling to BEV → mapping to a 1D vector → concat route map feature and speed feature

But, in the code, it seems that the order is reversed: concat route map feature and speed feature → pooling to BEV → mapping to a 1D vector. Like the codes in Mile.mile.models.mile.py:

Very confused, or am I wrong? pls tell me...

wayveai / mile

Confused about the Observation Encoder #33