Closed Gumpest closed 2 years ago
If I understand the question correctly, yes the marked layers are the output of each stream. The output of stream 1 (largest resolution) is passed to down sampling. The output of stream 2 and stream 3 are passed to fusion blocks.
Which three layers' outputs for neck fusion when the three stream is parallel. Those three ones? Thanks a lot!