pystiche / papers

Reference implementation and replication of prominent NST papers
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

Runtimeerror with single sample image in ulyanov_et_al_2016 #294

Closed jbueltemeier closed 1 year ago

jbueltemeier commented 1 year ago

The following RuntimeError occurs when performing a stylisation with a transformer in ulyanov_et_al_2016 with impl_params=False:

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 26 but got size 27 for tensor number 1 in the list.

The reason for this is the lack of image preprocessing, which prevents the image from being resized to a power of 2. This works well with most of the sample images, only with the image "bird" does the error occur.

Since this error is to be expected due to the missing preprocessing (the image pyramid has 6 levels) and this only affects this one image in this replication, I would suggest to remove this image from the replication script and issue an appropriate warning at impl_params=False in the stylisation function.

Or what do you mean @pmeier.

pmeier commented 1 year ago

IIUC, the images should have have side lengths, i.e. height and width, divisible by 2 ** 6 == 64, correct? If yes, can't we simply resize the images to the nearest integer multiple of that? For example:

image = ...
old_length = min(image.shape[-2:])
new_length = math.ceil(old_length / 64) * 64
resized_image = resize(image, new_length)
cropped_image = center_crop(resized_image, new_length)
new_height, new_width = cropped_image.shape[-2:]
assert new_height % 64 == 0
assert new_width % 64 == 0

That would enable people to use our stylization script with all images even their own without worrying about the size first. Since this is evaluation, we also don't need to worry about runtime costs.

pmeier commented 1 year ago

Or maybe no even center_crop, but just crop to the nearest multiple. johnson_alahi_li_2016 does something similar albeit at runtime:

https://github.com/pystiche/papers/blob/c5e075231d58fa1556a47a50d851a72f21ded443/pystiche_papers/johnson_alahi_li_2016/_data.py#L40

jbueltemeier commented 1 year ago

Yes, that sounds like a good idea. I would also prefer to crop to the nearest multiple so as not to change the image unnecessarily much.

I will implement this.