are GAIL results correct?

Hello, Nice work! I see that hierarchical imitation learning seem to work better than traditional IL on Hopper and Walker environments, however I am trying to understand why your GAIL baseline results are so low on both Hopper and Walker? In the original GAIL paper, they matched expert returns even with few trajectories in the expert dataset.