mjkwon2021 / CAT-Net

Official code for CAT-Net: Compression Artifact Tracing Network. Image manipulation detection and localization.
210 stars 25 forks source link

insufficient shared memory in training phase #34

Open LoveSiameseCat opened 1 year ago

LoveSiameseCat commented 1 year ago

Hi, I occurred the same problem as #24 When I tried to train this model. I saved all the images with the format of '.jpg'. However, the RAM memory linearly increased during the training process (Even I only iterate the training data and ignore anything else). Finally, the system would reported "RuntimeError: DataLoader worker (pid 11676) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.” It's hard for me the fix this bug after I tried everything, can you give me some advises about this? BTW, I found the variant "out_view" in AbstractDataset._get_jpeg_info is not used, I'm curious what it does.

CauchyComplete commented 1 year ago

Did you use the same environment as described in requirements.txt? RAM linearly increases at the beginning and stops at a certain point.

LoveSiameseCat commented 1 year ago

Thank you for your response. I have created a new environment on 2080ti, but the RAM memory leakage still exists. When I try to fix the problem, I found the problem may occur due to the 'jpegio' package. When I comment out this operation, this issue would be disappeared. However, I think this issue depends on the device, since I found it can be solved after I used another server.

CauchyComplete commented 1 year ago

Thanks for your report. I'm almost certain that I'll replace jpegio to another package in future work. I think jpegio is not stable enough. If anyone knows another library that supports the extraction of raw DCT coefficients, please let us know! I tested CAT-Net on both Windows and Linux but didn't face similar problems. As you reported, it's probably a device-dependent error...