liuliu / swift-diffusion

BSD 3-Clause "New" or "Revised" License
423 stars 33 forks source link

batch size not being used in Clip text encoder / embedding. #32

Closed ghost closed 1 year ago

ghost commented 1 year ago

Looks like you are creating embedding for max len L, but you are passing a sequence of length 2*L. Is batching not actually supported? does max length mean anything, as it seems its being violated?

liuliu commented 1 year ago

It is flatten because the Embed / IndexSelect only accept tensor in the shape without batch (and it doesn't matter because by the nature of the op). Thus, it will end up with the shape [b*L, C]. It is later split up in other ops: https://github.com/liuliu/swift-diffusion/blob/main/src/CLIPTextModel.swift#L25

ghost commented 1 year ago

Thanks