ml4bio / e2efold

pytorch implementation for "RNA Secondary Structure Prediction By Learning Unrolled Algorithms"
MIT License
106 stars 17 forks source link

Error when testing the pre-trained model on RNAStralign test dataset #8

Closed irleader closed 3 years ago

irleader commented 3 years ago

When I am running the commands with the environment you provided,

python e2e_learning_stage3.py -c config.json --test True python e2e_learning_stage3_rnastralign_all_long.py -c config_long.json --test True

The following error occurs:

… Batch number: 890 Batch number: 900 Traceback (most recent call last): File "e2e_learning_stage3.py", line 289, in all_test_only_e2e(test_generator, contact_net, lag_pp_net, device, test_data) File "/home/jxu/e2efold_master/e2efold/evaluation.py", line 154, in all_test_only_e2e for contacts, seq_embeddings, matrix_reps, seq_lens in test_generator: File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 804, in next idx, data = self._get_data() File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 771, in _get_data success, data = self._try_get_data() File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data data = self.data_queue.get(timeout=timeout) File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/queues.py", line 113, in get return _ForkingPickler.loads(res) File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 284, in rebuild_storage_fd fd = df.detach() File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/reduction.py", line 185, in recv_handle return recvfds(s, 1)[0] File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/reduction.py", line 161, in recvfds len(ancdata)) RuntimeError: received 0 items of ancdata

... Batch number: 740 Batch number: 750 Traceback (most recent call last): File "e2e_learning_stage3_rnastralign_all_long.py", line 584, in all_test_only_e2e() File "e2e_learning_stage3_rnastralign_all_long.py", line 344, in all_test_only_e2e for seq_embedding_batch, PEbatch, , comb_index, seq_embeddings, contacts, seq_lens in test_generator_1800: File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 804, in next idx, data = self._get_data() File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 771, in _get_data success, data = self._try_get_data() File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data data = self.data_queue.get(timeout=timeout) File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/queues.py", line 113, in get return _ForkingPickler.loads(res) File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 284, in rebuild_storage_fd fd = df.detach() File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/reduction.py", line 185, in recv_handle return recvfds(s, 1)[0] File "/home/jxu/anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/reduction.py", line 161, in recvfds len(ancdata)) RuntimeError: received 0 items of ancdata

The only thing I modify is the e2e_learning_stage3.py line 118:

contact_net.load_state_dict(torch.load(model_path)) -->contact_net.load_state_dict(torch.load(model_path,map_location=device))

and line 128:

rna_ss_e2e.load_state_dict(torch.load(e2e_model_path)) --> rna_ss_e2e.load_state_dict(torch.load(e2e_model_path,map_location=device))

because I am running on CPU only server, there was error on these two lines.

Can you help to check what the "RuntimeError: received 0 items of ancdata" might have come from?

liyu95 commented 3 years ago

Hi, thank you very much for your interest! It seems that the error is related to multiprocessing. It seems there is read conflict related to the queue lib in python3. I am not familiar with the hardware of your CPU server configuration. But I guess the error would go if you run it one a stand-alone workstation.

Sincerely, Yu

irleader commented 3 years ago

Hi Yu,

Thanks a lot for your fast reply. It is actually related to multiprocessing. It can be fixed by adding this in evaluation.py:

torch.multiprocessing.set_sharing_strategy(‘file_system’)

charlesxu90 commented 3 years ago

Hi Yu,

Thanks a lot for your fast reply. It is actually related to multiprocessing. It can be fixed by adding this in evaluation.py:

torch.multiprocessing.set_sharing_strategy(‘file_system’)

Same problem occur on my workstation. Fixed by add this line below

File "e2efold/e2efold/evaluation.py", line 6

    torch.multiprocessing.set_sharing_strategy('file_system')

Error message:

Traceback (most recent call last):
  File "e2e_learning_stage3.py", line 247, in <module>
    per_family_evaluation()
  File "e2e_learning_stage3.py", line 192, in per_family_evaluation
    for contacts, seq_embeddings, matrix_reps, seq_lens in test_generator:
  File "/home/xxp/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 804, in __next__
    idx, data = self._get_data()
  File "/home/xxp/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 771, in _get_data
    success, data = self._try_get_data()
  File "/home/xxp/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 724, in _try_get_data
    data = self.data_queue.get(timeout=timeout)
  File "/home/xxp/anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/home/xxp/anaconda3/envs/rna_ss/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 284, in rebuild_storage_fd
    fd = df.detach()
  File "/home/.../anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/home/xxp/anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/reduction.py", line 185, in recv_handle
    return recvfds(s, 1)[0]
  File "/home/xxp/anaconda3/envs/rna_ss/lib/python3.7/multiprocessing/reduction.py", line 161, in recvfds
    len(ancdata))
RuntimeError: received 0 items of ancdata