Open kraj011 opened 3 months ago
Hi. Yes, it should work, although it will require some small code tweaking. I might get around to implement it later. If you want to try it yourself and share the changes you made, I can integrate them into the repo.
Hi! This work looks awesome, congratulations! I was wondering if bounding boxes were inherently required for bounded attention to work, as opposed to more fine-grained masks. I saw that during the denoising process, you construct finer masks anyway so would this method still work if those were provided in the first place?
Thanks!
Edit: it seems like the boxes are converted to masks anyways in the process through the
def _obtain_masks
function, so it seems like providing finer masks (obviously modifying the code as well) should work?