idrblab / AnnoPRO

Feature map and function annotation of Proteins
MIT License
26 stars 8 forks source link

UnboundLocalError: local variable 'data_onehot' referenced before assignment #11

Closed huang-zeyu closed 1 year ago

huang-zeyu commented 1 year ago

Hello!

I followed the installation tutorial on the README.md. I tried both

The commands I used are:

After these attempts, I got the same error as indicated in the Logs. Do you know how to fix this? Thanks in advance!

Input

The data.fasta is like:

>sp|O14793|GDF8_HUMAN Growth/differentiation factor 8 OS=Homo sapiens OX=9606 GN=MSTN PE=1 SV=1
MQKLQLCVYIYLFMLIVAGPVDLNENSEQKENVEKEGLCNACTWRQNTKSSRIEAIKIQI
LSKLRLETAPNISKDVIRQLLPKAPPLRELIDQYDVQRDDSSDGSLEDDDYHATTETIIT
MPTESDFLMQVDGKPKCCFFKFSSKIQYNKVVKAQLWIYLRPVETPTTVFVQILRLIKPM
KDGTRYTGIRSLKLDMNPGTGIWQSIDVKTVLQNWLKQPESNLGIEIKALDENGHDLAVT
FPGPGEDGLNPFLEVKVTDTPKRSRRDFGLDCDEHSTESRCCRYPLTVDFEAFGWDWIIA
PKRYKANYCSGECEFVFLQKYPHTHLVHQANPRGSAGPCCTPTKMSPINMLYFNGKEQII
YGKIPAMVVDRCGCS

The run.py is:

from annopro import main
main("data.fasta", "output", "0,1,2,3")

or

from annopro import main
main("data.fasta", "output")

Runtime Environment

Logs

Running "module reset". Resetting modules to system default. The following $MODULEPATH directories have been removed: None A conda environment has been detected CONDA_PREFIX= CONDA_PATH/envs/annopro anaconda3_gpu is loaded. Consider running conda deactivate and reloading it. diamond v2.1.0.154 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org/ Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 4

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: output

Target sequences to report alignments for: 25

Opening the database... [0.044s] Database: USER_DIR/.annopro/data/cafa4.dmnd (type: Diamond database, sequences: 87514, letters: 44798577) Block size = 2000000000 Opening the input file... [0.001s] Opening the output file... [0s] Loading query sequences... [0s] Masking queries... [0s] Algorithm: Double-indexed Building query histograms... [0s] Loading reference sequences... [0.042s] Masking reference... [0.583s] Initializing temporary storage... [0.007s] Building reference histograms... [0.402s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4. Building reference seed array... [0.159s] Building query seed array... [0s] Computing hash join... [0.003s] Masking low complexity seeds... [0s] Searching alignments... [0.001s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4. Building reference seed array... [0.186s] Building query seed array... [0s] Computing hash join... [0.003s] Masking low complexity seeds... [0s] Searching alignments... [0s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 3/4. Building reference seed array... [0.202s] Building query seed array... [0s] Computing hash join... [0.003s] Masking low complexity seeds... [0s] Searching alignments... [0s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 4/4. Building reference seed array... [0.16s] Building query seed array... [0s] Computing hash join... [0.003s] Masking low complexity seeds... [0s] Searching alignments... [0s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 1/4. Building reference seed array... [0.151s] Building query seed array... [0s] Computing hash join... [0.003s] Masking low complexity seeds... [0s] Searching alignments... [0s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 2/4. Building reference seed array... [0.186s] Building query seed array... [0s] Computing hash join... [0.003s] Masking low complexity seeds... [0s] Searching alignments... [0s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 3/4. Building reference seed array... [0.201s] Building query seed array... [0s] Computing hash join... [0.003s] Masking low complexity seeds... [0s] Searching alignments... [0s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 4/4. Building reference seed array... [0.155s] Building query seed array... [0s] Computing hash join... [0.004s] Masking low complexity seeds... [0s] Searching alignments... [0s] Deallocating memory... [0s] Deallocating buffers... [0s] Clearing query masking... [0s] Computing alignments... Loading trace points... [0.014s] Sorting trace points... [0s] Computing alignments... [0.004s] Deallocating buffers... [0s] Loading trace points... [0s] [0.022s] Deallocating reference... [0s] Loading reference sequences... [0s] Deallocating buffers... [0s] Deallocating queries... [0s] Loading query sequences... [0s] Closing the input file... [0s] Closing the output file... [0.001s] Closing the database... [0s] Cleaning up... [0s] Total time = 2.668s Reported 5 pairwise alignments, 5 HSPs. 1 queries aligned. 2023-06-16 13:27:18.651280: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-06-16 13:27:27.991777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 43511 MB memory: -> device: 0, name: NVIDIA A40, pci bus id: 0000:07:00.0, compute capability: 8.6 2023-06-16 13:27:28.072954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 43511 MB memory: -> device: 1, name: NVIDIA A40, pci bus id: 0000:46:00.0, compute capability: 8.6 2023-06-16 13:27:28.074780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 43511 MB memory: -> device: 2, name: NVIDIA A40, pci bus id: 0000:85:00.0, compute capability: 8.6 2023-06-16 13:27:28.076816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 43511 MB memory: -> device: 3, name: NVIDIA A40, pci bus id: 0000:c7:00.0, compute capability: 8.6 Traceback (most recent call last): File "CONDA_PATH/envs/annopro/lib/python3.8/run.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "CONDA_PATH/envs/annopro/lib/python3.8/run.py", line 87, in _run_code exec(code, run_globals) File "PROJECT_PATH/AnnoPRO/annopro/main.py", line 4, in console_main() File "PROJECT_PATH/AnnoPRO/annopro/init.py", line 27, in console_main main( File "PROJECT_PATH/AnnoPRO/annopro/init.py", line 75, in main predict(output_dir=output_dir, File "PROJECT_PATH/AnnoPRO/annopro/prediction.py", line 19, in predict init_evaluate(term_type=term_type, File "PROJECT_PATH/AnnoPRO/annopro/prediction.py", line 160, in init_evaluate preds = model.predict(data_generator, steps=data_steps) File "CONDA_PATH/envs/annopro/lib/python3.8/site-packages/keras/engine/training.py", line 1720, in predict data_handler = data_adapter.get_data_handler( File "CONDA_PATH/envs/annopro/lib/python3.8/site-packages/keras/engine/data_adapter.py", line 1383, in get_data_handler return DataHandler(*args, **kwargs) File "CONDA_PATH/envs/annopro/lib/python3.8/site-packages/keras/engine/data_adapter.py", line 1138, in init self._adapter = adapter_cls( File "CONDA_PATH/envs/annopro/lib/python3.8/site-packages/keras/engine/data_adapter.py", line 917, in init super(KerasSequenceAdapter, self).init( File "CONDA_PATH/envs/annopro/lib/python3.8/site-packages/keras/engine/data_adapter.py", line 794, in init peek, x = self._peek_and_restore(x) File "CONDA_PATH/envs/annopro/lib/python3.8/site-packages/keras/engine/data_adapter.py", line 928, in _peek_and_restore return x[0], x File "PROJECT_PATH/AnnoPRO/annopro/prediction.py", line 50, in getitem return ([data_onehot, data_si]) UnboundLocalError: local variable 'data_onehot' referenced before assignment

GCS-ZHN commented 1 year ago

We use profeatpy to calculate protein sequence features and it was developed by @swallow-design with Fortran. But its input is not a standard FASTA format so it requires sequence info rows only contain id without any space, for example:

>sp|O14793|GDF8_HUMAN
MQKLQLCVYIYLFMLIVAGPVDLNENSEQKENVEKEGLCNACTWRQNTKSSRIEAIKIQI
LSKLRLETAPNISKDVIRQLLPKAPPLRELIDQYDVQRDDSSDGSLEDDDYHATTETIIT
MPTESDFLMQVDGKPKCCFFKFSSKIQYNKVVKAQLWIYLRPVETPTTVFVQILRLIKPM
KDGTRYTGIRSLKLDMNPGTGIWQSIDVKTVLQNWLKQPESNLGIEIKALDENGHDLAVT
FPGPGEDGLNPFLEVKVTDTPKRSRRDFGLDCDEHSTESRCCRYPLTVDFEAFGWDWIIA
PKRYKANYCSGECEFVFLQKYPHTHLVHQANPRGSAGPCCTPTKMSPINMLYFNGKEQII
YGKIPAMVVDRCGCS
huang-zeyu commented 1 year ago

Problem solved! Thanks!