tbepler / topaz

Pipeline for particle picking in cryo-electron microscopy images using convolutional neural networks trained from positive and unlabeled examples. Also featuring micrograph and tomogram denoising with DNNs.
GNU General Public License v3.0
169 stars 63 forks source link

NumPy array Warning #68

Open girodat opened 3 years ago

girodat commented 3 years ago

Dear Topaz Experts,

I have tried downloading Topaz and running the tutorial. However, when I reach to first training step I see the warning: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/conda/conda-bld/pytorch_1595628427286/work/torch/csrc/utils/tensor_numpy.cpp:141.)

Any ideas as to why my NumPy array is not writeable?

Thanks, Dylan

tbepler commented 3 years ago

This is a warning that new versions of pytorch started raising because our mrc micrograph/tomogram arrays are read-only. The warning is annoying, but, to my knowledge, doesn't cause any actual errors and can safely be ignored.

tbepler commented 3 years ago

It seems like the fix to get rid of this warning is to copy the micrograph numpy array after parsing the raw mrc file content. Currently, the parsing code operates as follows:

with open(path, 'rb') as f:
    content = f.read()
mic,header,header_extended = mrc.parse(content)

The file content is read into a byte string in RAM and then interpreted as an mrc by the parsing code. Within that, the raw bytes are converted to a numpy array for the micrograph using numpy.frombuffer. This creates a view of the parsed byte string which is read only, because python strings are immutable. There may be a way to change the flag on the numpy array to allow writing, but that could have consequences, because changes to the micrograph array will change the content byte string. I don't think there are any places in the code where the content array is actually reused, so that could be a viable solution too. Copying is a safer approach, doubles the RAM requirements which could already be pinched, especially when parsing tomograms.

girodat commented 3 years ago

Hi Tristan,

Thank you for the help I will give copying the micrograph numpy array after parsing a try.

Sincerely, Dylan

http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail Virus-free. www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On Mon, Oct 26, 2020 at 9:34 PM Tristan Bepler notifications@github.com wrote:

It seems like the fix to get rid of this warning is to copy the micrograph numpy array after parsing the raw mrc file content. Currently, the parsing code operates as follows:

with open(path, 'rb') as f: content = f.read() mic,header,header_extended = mrc.parse(content)

The file content is read into a byte string in RAM and then interpreted as an mrc by the parsing code. Within that, the raw bytes are converted to a numpy array for the micrograph using numpy.frombuffer. This creates a view of the parsed byte string which is read only, because python strings are immutable. There may be a way to change the flag on the numpy array to allow writing, but that could have consequences, because changes to the micrograph array will change the content byte string. I don't think there are any places in the code where the content array is actually reused, so that could be a viable solution too. Copying is a safer approach, doubles the RAM requirements which could already be pinched, especially when parsing tomograms.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tbepler/topaz/issues/68#issuecomment-716957094, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIXZVOVVLJLI2PWGHLB5IP3SMY5UHANCNFSM4STUXL3Q .

bu-bgregor commented 2 years ago

@tbepler - I have found that the "UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. " message can be eliminated by using copies of numpy arrays in two places. Well, at least the way we've been testing out Topaz training this was the only change needed:

RandomImageTransforms class in topaz/topaz/utils/data/sampler.py in the __getitem__ method at line 220:

        if self.to_tensor:
            X = torch.from_numpy(np.array(X, copy=True))
            if type(Y) is Image.Image:
                Y = torch.from_numpy(np.array(Y, copy=True)).float()

and similarly in the SegmentedImageDataset in topaz/topaz/utils/data/loader.py at line 229:

        if self.to_tensor:
            im = torch.from_numpy(np.array(im, copy=True))
            label = torch.from_numpy(np.array(label, copy=True)).float()
tbepler commented 2 years ago

Copying the array should remove the warning, but it isn't necessary, because we don't write the array in place.