Depending on the model you can use the associated pronunciation dictionary to get the transcriptions for the inference (duration markers are supported in training and inference), e.g.:
for this Chinese model: use this narrow dictionary from here
for this English model: use this narrow dictionary from here
for train, we can get duration marker as
but, when inference, how can we get duration marker? Or it is just used in train stage?