jonathanking / sidechainnet

An all-atom protein structure dataset for machine learning.
BSD 3-Clause "New" or "Revised" License
322 stars 36 forks source link

Colab create_custom walkthrough error #41

Closed Jiram-Kin closed 2 years ago

Jiram-Kin commented 2 years ago

Thanks for the awesome package, I am trying to use a custom dataset for training my model, however I found out that the create custom function is returning error 'Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead'. I saw that you added detach to some code recently so I wonder if you could do the same for other functionality as well

Thank you so much

jonathanking commented 2 years ago

Hi Jiram-Kin,

I am glad you’re enjoying the package. Thanks for bringing this issue to my attention, and I apologize for the inconvenience! Can you provide me with a screenshot or text containing the complete error message?  I want to be sure to make the right change to the code.

Thank you. On Nov 21, 2021, 3:12 AM +0100, Jiram-Kin @.***>, wrote:

Thanks for the awesome package, I am trying to use a custom dataset for training my model, however I found out that the create custom function is returning error 'Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead'. I saw that you added detach to some code recently so I wonder if you could do the same for other functionality as well Thank you so much — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Jiram-Kin commented 2 years ago

scn_error

This is the screenshot, for full error message it is the following

Pre-parsed ProteinNet already downloaded. Re-initializing validation set splits ([10, 90]). Loading complete ProteinNet data (100% thinning) from /usr/local/lib/python3.7/dist-packages/sidechainnet/resources/proteinnet_parsed. Raw ProteinNet files already preprocessed (/usr/local/lib/python3.7/dist-packages/sidechainnet/resources/proteinnet_parsed/training_100.pkl). Preparing to download requested proteins via their ProteinNet IDs. Downloading SidechainNet specific data from RSCB PDB. 141 IDs OK for parallel downloading. 17%|█▋ | 24/141 [00:20<01:40, 1.17it/s]

RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, **kwds)) File "/usr/local/lib/python3.7/dist-packages/sidechainnet/utils/download.py", line 246, in process_id dihedrals_coords_sequence = get_seq_coords_and_angles(chain) File "/usr/local/lib/python3.7/dist-packages/sidechainnet/utils/measure.py", line 231, in get_seq_coords_and_angles prev_ang) File "/usr/local/lib/python3.7/dist-packages/sidechainnet/utils/measure.py", line 290, in standardize_residue new_res = rb.to_prody(res) File "/usr/local/lib/python3.7/dist-packages/sidechainnet/structure/StructureBuilder.py", line 441, in to_prody ag.setCoords(torch.stack(self.bb + self.sc).numpy()) RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead. """

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)

in () 1 d = scn.create_custom(pnids=training_ids + valid_ids + test_ids, 2 output_filename="custom01.pkl", ----> 3 short_description="CASP12 data with the exception of CASP11's test set.") 4 frames /usr/local/lib/python3.7/dist-packages/sidechainnet/create.py in create_custom(pnids, output_filename, sidechainnet_out, short_description, regenerate_scdata) 360 proteinnet_in=proteinnet_in, 361 regenerate_scdata=regenerate_scdata, --> 362 output_name=intermediate_filename) 363 364 # Finally, unify the sidechain data with ProteinNet /usr/local/lib/python3.7/dist-packages/sidechainnet/utils/download.py in download_sidechain_data(pnids, sidechainnet_out_dir, casp_version, thinning, limit, proteinnet_in, regenerate_scdata, output_name) 128 129 # Download the sidechain data as a dictionary and report errors. --> 130 sc_data, pnids_errors = get_sidechain_data(new_pnids, limit) 131 for p in already_parsed_ids: 132 sc_data[p] = existing_data[p] /usr/local/lib/python3.7/dist-packages/sidechainnet/utils/download.py in get_sidechain_data(pnids, limit) 191 total=len(pnids_ok_parallel[:limit]), 192 dynamic_ncols=True, --> 193 smoothing=0))) 194 pnids_ok_parallel, remaining_pnids = get_parallel_sequential(remaining_pnids) 195 /usr/local/lib/python3.7/dist-packages/tqdm/std.py in __iter__(self) 1178 1179 try: -> 1180 for obj in iterable: 1181 yield obj 1182 # Update and possibly print the progressbar. /usr/lib/python3.7/multiprocessing/pool.py in next(self, timeout) 746 if success: 747 return value --> 748 raise value 749 750 __next__ = next # XXX RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.
jonathanking commented 2 years ago

Thank you. The problem was that I had made the fix in GitHub, but did not publish a new version (v0.7.4) via pip. Now the Colab notebook (or any other pip-installed version of sidechainnet) should work!

Jiram-Kin commented 2 years ago

Thank you, the create_custom function is working properly now!