meddwl / psearch

3D ligand-based pharmacophore modeling
BSD 3-Clause "New" or "Revised" License
46 stars 16 forks source link

prepare_dataset issue #21

Open julianaamorim opened 10 months ago

julianaamorim commented 10 months ago

Hello, I have just installed psearch and all the env dependencies in conda. I downloaded the acetylcholinestarase (AChE) dataset to do a test and in the phase of preparing the dataset I came across the following error:

Traceback (most recent call last): File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs)Traceback (most recent call last): File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(self._args, self._kwargs) File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/prepare_dataset.py", line 61, in common create_db.main_params(dbout_fname=filenames[4], File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/scripts/create_db.py", line 156, in main_params for i, res in enumerate(p.imap_unordered(map_process_mol, File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 451, in return (item for chunk in result for item in chunk) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 873, in next raise value OSError: File error: Invalid input file /home/juliana/Downloads/Ache/compounds/inactive_conf.sdf Process Process-1: Traceback (most recent call last): File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/prepare_dataset.py", line 61, in common create_db.main_params(dbout_fname=filenames[4], File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/scripts/create_db.py", line 156, in main_params for i, res in enumerate(p.imap_unordered(map_process_mol, File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 451, in return (item for chunk in result for item in chunk) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 873, in next raise value OSError: File error: Invalid input file /home/juliana/Downloads/Ache/compounds/active_conf.sdf File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/prepare_dataset.py", line 61, in common create_db.main_params(dbout_fname=filenames[4], File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/scripts/create_db.py", line 156, in main_params for i, res in enumerate(p.imap_unordered(map_process_mol, File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 451, in return (item for chunk in result for item in chunk) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 873, in next raise value OSError: File error: Invalid input file /home/juliana/Downloads/Ache/compounds/inactive_conf.sdf Process Process-1: Traceback (most recent call last): File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(self._args, self._kwargs) File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/prepare_dataset.py", line 61, in common create_db.main_params(dbout_fname=filenames[4], File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/site-packages/psearch/scripts/create_db.py", line 156, in main_params for i, res in enumerate(p.imap_unordered(map_process_mol, File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 451, in return (item for chunk in result for item in chunk) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juliana/anaconda3/envs/psearch/lib/python3.12/multiprocessing/pool.py", line 873, in next raise value OSError: File error: Invalid input file /home/juliana/Downloads/Ache/compounds/active_conf.sdf

No file in the "compounds" folder was generated... Any suggestions on how to move forward?

Thanks in advance...

Juliana

DrrDom commented 10 months ago

From which branch did you install psearch? You have to use gen_pharms branch, it is the most recent. Unfortunately we still did not fix all remaining bugs and merge it to the master. Another aspect, I never used psearch with python 3.12, maximum it was 3.9. However, it should be a problem.

julianaamorim commented 10 months ago

Running psearch on my current working dataset got my trainset list stuck... I tried some checking in select_training_set_rdkit.py to understand the problem, but to no avail...

>> psearch -p my_models_2/created_pharmacophores/ -i beta_2_short.smi -d dbs/beta.dat -c 4 Size of df before generating fingerprints: (101, 3) Size of df after generating fingerprints: (101, 4) Size of df_mols before concatenation: (101, 4) Size of df_mols after concatenation: (101, 5) 100 molecules screened 00:00:01 external_statistics.txt: (0.009s)

Any light?

DrrDom commented 10 months ago

Does it return any pharmacophore model? If not, it may happen that it cannot create training sets. I got something similar in the past and it would be reasonable to implement another modeling mode, where all input ligands will be taken as a training set without selection like now. It may also be some bug. If you can share your data set, I may look on it when I'll have time, but I do not promise that will do this quickly.

julianaamorim commented 8 months ago

Increasing the number of active compounds solved the problem...