yikunpku / RNA-MSM

Nucleic Acids Research 2024:RNA-MSM model is an unsupervised RNA language model based on multiple sequences that outputs both embedding and attention map to match different types of downstream tasks.
https://aigene.cloudbastion.cn/#/rna-msm
MIT License
42 stars 4 forks source link

no such file or directory: 'hhfilter' #8

Closed Binyun-Z closed 6 months ago

Binyun-Z commented 8 months ago

When I ran the sample program I encountered the following problem

---> 24 print(rna_data[0]) Cell In[15], line 24 14 test_rnas.sort() 16 rna_data = RNADataset( 17 data_path=cfg.data.root_path, 18 msa_path=cfg.data.MSA_path, (...) 22 sample_method=cfg.data.sample_method, 23 ) ---> 24 print(rna_data[0]) 25 rna_data = RandomCropDataset( 26 rna_data, 27 cfg.data.max_seqlen, 28 ) 30 print(rna_data[0])

File /pentapool/home/zhangbinyun/Project/RNA_foundation_model/RNA-MSM/dataset.py:122, in RNADataset.getitem(self, index) 120 def getitem(self, index): 121 rna_id = self.rna_id[index] --> 122 msa = self.a3m_data[index] 123 tokens = torch.from_numpy(self.vocab.encode(msa)) 125 return rna_id, tokens

File /pentapool/home/zhangbinyun/Project/RNA_foundation_model/RNA-MSM/dataset.py:88, in A2MDataset.getitem(self, index) 86 msa = MSA.from_fasta(self._file_list[index]) 87 if self._max_seqs_per_msa is not None: ---> 88 msa = msa.select_diverse( 89 self._max_seqs_per_msa, method=self._sample_method 90 ) 91 return msa

File /pentapool/home/zhangbinyun/Project/RNA_foundation_model/RNA-MSM/utils/align.py:171, in MSA.select_diverse(self, num_seqs, method) 168 return self 170 if method == "hhfilter": --> 171 msa = self.hhfilter(diff=num_seqs) # diff=num_seqs 172 if num_seqs < msa.depth: 173 msa = msa.select(np.arange(num_seqs))

File /pentapool/home/zhangbinyun/Project/RNA_foundation_model/RNA-MSM/utils/align.py:98, in MSA.hhfilter(self, seqid, diff, cov, qid, qsc, binary) 84 output_file = tempdir / "output.fasta" 85 command = " ".join( 86 [ 87 f"{binary}", (...) 96 ] 97 ).split(" ") ---> 98 result = subprocess.run(command, capture_output=True) 99 result.check_returncode() 100 with output_file.open() as f:

File ~/envs/MoE/lib/python3.8/subprocess.py:493, in run(input, capture_output, timeout, check, *popenargs, *kwargs) 490 kwargs['stdout'] = PIPE 491 kwargs['stderr'] = PIPE --> 493 with Popen(popenargs, **kwargs) as process: 494 try: 495 stdout, stderr = process.communicate(input, timeout=timeout)

File ~/envs/MoE/lib/python3.8/subprocess.py:858, in Popen.init(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text) 854 if self.text_mode: 855 self.stderr = io.TextIOWrapper(self.stderr, 856 encoding=encoding, errors=errors) --> 858 self._execute_child(args, executable, preexec_fn, close_fds, 859 pass_fds, cwd, env, 860 startupinfo, creationflags, shell, 861 p2cread, p2cwrite, 862 c2pread, c2pwrite, 863 errread, errwrite, 864 restore_signals, start_new_session) 865 except: 866 # Cleanup if the child failed starting. 867 for f in filter(None, (self.stdin, self.stdout, self.stderr)):

File ~/envs/MoE/lib/python3.8/subprocess.py:1720, in Popen._execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session) 1718 if errno_num != 0: 1719 err_msg = os.strerror(errno_num) -> 1720 raise child_exception_type(errno_num, err_msg, err_filename) 1721 raise child_exception_type(err_msg)

yikunpku commented 6 months ago

Hi, hhfilter is an open-source tool, you can find installation instructions via this link.