Right now the mp4 example just assumes the first frame from each track starts at NPT ("normal play time") 0. That's not really true. Depending on chance, it might be misaligned by as much as one full video frame (eg ~33 ms at 30 fps, 200 ms at 5 fps). The right thing to do is to add an mp4 edit list. It could either skip the beginning of the earlier track (so they both start at the same time) or delay the other (to not lose any information).
Right now the mp4 example just assumes the first frame from each track starts at NPT ("normal play time") 0. That's not really true. Depending on chance, it might be misaligned by as much as one full video frame (eg ~33 ms at 30 fps, 200 ms at 5 fps). The right thing to do is to add an mp4 edit list. It could either skip the beginning of the earlier track (so they both start at the same time) or delay the other (to not lose any information).