CPU batched translation performance?

I have deployed a model with mini-batch: 2 , maxi-batch:2 and cpu-threads:8. When running performance test, I observed that a sentence with 60 tokens took 1.7 seconds to translate while 2 sentences took 2.4 seconds. Adding more sentences resulted in about 0.1 more second per sentence. Why is translating 2 sentences so much slower than 1 sentence? How is sentences distributed to different cpu cores? Is it reasonable to assume that a batch translation should take about the same time as translating the longest sentence in the batch?

marian-nmt / marian

CPU batched translation performance? #349