Questions about crop processing of DCT volume

zamling commented 2 years ago

Hi~ I saw there is a param self._grid_crop in dataset class, determining a random number wheter it is a integer multiple of 8. Does it have a large influence when I set this param as False.

if self._grid_crop:
    s_r = (random.randint(0, max(h - crop_size[0], 0)) // 8) * 8
    s_c = (random.randint(0, max(w - crop_size[1], 0)) // 8) * 8
else:
    s_r = random.randint(0, max(h - crop_size[0], 0))
    s_c = random.randint(0, max(w - crop_size[1], 0))
# crop img_RGB
img_RGB = img_RGB[s_r:s_r+crop_size[0], s_c:s_c+crop_size[1], :]

I worry about that when I set this param as False, the crop results cannot align with the original DCT blocks (since it seems like the DCT in jpegio is calculated by a 8x8 block. And cropping may break this block and generate a new 8x8 block, which is not the same as the original DCT information from jpegio). btw, I am also stuck in the implementation about DCT calculation in jpegio. It is quite different with the results by cv2.dct(Y) in Y channel. Can you give me some breif explanations? :) thanks a lot <3

CauchyComplete commented 2 years ago

Well, I think training without grid alignment may not converge well because frequencies are mixed. But I haven't tried it, so it may work. I think the difference between cv2.dct(Y) and jpegio is because jpegio reads raw DCT coefficients from the JPEG header while cv2.dct(Y) computes it from a decoded image. JPEG decoding, as well as encoding, causes information loss, so the results are different.

zamling commented 2 years ago

I get it !!! Thanks <3

mjkwon2021 / CAT-Net

Questions about crop processing of DCT volume #23