sacdallago / bio_embeddings

Get protein embeddings from protein sequences
http://docs.bioembeddings.com
MIT License
462 stars 65 forks source link

Protocol plotly: RuntimeError: Expected embeddings in projected_reduced_embeddings_file to be of shape (3,), not (2,) #133

Closed ptizei closed 3 years ago

ptizei commented 3 years ago

Running Basic Protocol 1 from the Cur protocols paper I get the variable shape error shown in the auto-generated issue name. I'll post the full terminal log from stdout and stderr below the auto-generated part.

Metadata

key value
version 0.2.0
cuda True

Parameter

key value
type visualize
protocol plotly
annotation_file disprot_annotations.csv
display_unknown False
depends_on umap_projections

Traceback

Traceback (most recent call last):
  File "/home/phil/.local/lib/python3.8/site-packages/bio_embeddings/utilities/pipeline.py", line 280, in execute_pipeline_from_config
    stage_output_parameters = stage_runnable(**stage_parameters)
  File "/home/phil/.local/lib/python3.8/site-packages/bio_embeddings/visualize/pipeline.py", line 169, in run
    return PROTOCOLS[kwargs["protocol"]](result_kwargs)
  File "/home/phil/.local/lib/python3.8/site-packages/bio_embeddings/visualize/pipeline.py", line 38, in plotly
    raise RuntimeError(
RuntimeError: Expected embeddings in projected_reduced_embeddings_file to be of shape (3,), not (2,)

More info


2021-05-18 08:27:21,746 INFO Created the file disprot_sampled/input_parameters_file.yml
2021-05-18 08:27:21,751 INFO Created the file disprot_sampled/sequences_file.fasta
2021-05-18 08:27:21,752 INFO Created the file disprot_sampled/mapping_file.csv
2021-05-18 08:27:21,752 INFO Created the file disprot_sampled/remapped_sequences_file.fasta
2021-05-18 08:27:21,754 INFO Created the stage directory disprot_sampled/protbert_embeddings
2021-05-18 08:27:21,754 INFO Created the file disprot_sampled/protbert_embeddings/input_parameters_file.yml
2021-05-18 08:27:21,755 INFO Loading model_directory for prottrans_bert_bfd from cache at '/home/phil/.cache/bio_embeddings/prottrans_bert_bfd/model_directory'
Some weights of the model checkpoint at /home/phil/.cache/bio_embeddings/prottrans_bert_bfd/model_directory were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2021-05-18 08:27:27,382 INFO The minimum expected size for the reduced_embedding_file is 397.3 kB.
2021-05-18 08:27:27,382 INFO You are going to generate a total of 397.3 kB of embeddings, and have 3.6 TB available at disprot_sampled.
2021-05-18 08:27:27,382 INFO Created the file disprot_sampled/protbert_embeddings/reduced_embeddings_file.h5

  0%|          | 0/97 [00:00<?, ?it/s]
  1%|          | 1/97 [00:00<00:50,  1.90it/s]
 72%|███████▏  | 70/97 [00:00<00:00, 128.39it/s]
 99%|█████████▉| 96/97 [00:00<00:00, 135.47it/s]
2021-05-18 08:27:28,092 INFO Created the file disprot_sampled/protbert_embeddings/ouput_parameters_file.yml
2021-05-18 08:27:28,093 INFO Created the stage directory disprot_sampled/umap_projections
2021-05-18 08:27:28,093 INFO Created the file disprot_sampled/umap_projections/input_parameters_file.yml
2021-05-18 08:27:32,022 INFO Created the file disprot_sampled/umap_projections/projected_embeddings_file.csv
2021-05-18 08:27:32,024 INFO Created the file disprot_sampled/umap_projections/projected_reduced_embeddings_file.h5
2021-05-18 08:27:32,045 INFO Created the file disprot_sampled/umap_projections/ouput_parameters_file.yml
2021-05-18 08:27:32,047 INFO Created the stage directory disprot_sampled/plotly_visualization
2021-05-18 08:27:32,047 INFO Created the file disprot_sampled/plotly_visualization/input_parameters_file.yml
Traceback (most recent call last):
  File "/home/phil/.local/lib/python3.8/site-packages/bio_embeddings/utilities/pipeline.py", line 280, in execute_pipeline_from_config
    stage_output_parameters = stage_runnable(**stage_parameters)
  File "/home/phil/.local/lib/python3.8/site-packages/bio_embeddings/visualize/pipeline.py", line 169, in run
    return PROTOCOLS[kwargs["protocol"]](result_kwargs)
  File "/home/phil/.local/lib/python3.8/site-packages/bio_embeddings/visualize/pipeline.py", line 38, in plotly
    raise RuntimeError(
RuntimeError: Expected embeddings in projected_reduced_embeddings_file to be of shape (3,), not (2,)

Consider reporting this error at this url: https://github.com/sacdallago/bio_embeddings/issues/new?title=Protocol+plotly%3A+RuntimeError%3A+Expected+embeddings+in+projected_reduced_embeddings_file+to+be+of+shape+%283%2C%29%2C+not+%282%2C%29&body=%23%23+Metadata%0A%7Ckey%7Cvalue%7C%0A%7C--%7C--%7C%0A%7C%2A%2Aversion%2A%2A%7C0.2.0%7C%0A%7C%2A%2Acuda%2A%2A%7CTrue%7C%0A%0A%23%23+Parameter%0A%7Ckey%7Cvalue%7C%0A%7C--%7C--%7C%0Atype%7Cvisualize%0Aprotocol%7Cplotly%0Aannotation_file%7Cdisprot_annotations.csv%0Adisplay_unknown%7CFalse%0Adepends_on%7Cumap_projections%0A%0A%23%23+Traceback%0A%60%60%60%0ATraceback+%28most+recent+call+last%29%3A%0A++File+%22%2Fhome%2Fphil%2F.local%2Flib%2Fpython3.8%2Fsite-packages%2Fbio_embeddings%2Futilities%2Fpipeline.py%22%2C+line+280%2C+in+execute_pipeline_from_config%0A++++stage_output_parameters+%3D+stage_runnable%28%2A%2Astage_parameters%29%0A++File+%22%2Fhome%2Fphil%2F.local%2Flib%2Fpython3.8%2Fsite-packages%2Fbio_embeddings%2Fvisualize%2Fpipeline.py%22%2C+line+169%2C+in+run%0A++++return+PROTOCOLS%5Bkwargs%5B%22protocol%22%5D%5D%28result_kwargs%29%0A++File+%22%2Fhome%2Fphil%2F.local%2Flib%2Fpython3.8%2Fsite-packages%2Fbio_embeddings%2Fvisualize%2Fpipeline.py%22%2C+line+38%2C+in+plotly%0A++++raise+RuntimeError%28%0ARuntimeError%3A+Expected+embeddings+in+projected_reduced_embeddings_file+to+be+of+shape+%283%2C%29%2C+not+%282%2C%29%0A%60%60%60%0A%0A%23%23+More+info%0A

Stage plotly_visualization failed.
UMAP(angular_rp_forest=True, dens_frac=0.0, dens_lambda=0.0, metric='cosine',
     min_dist=0.6, random_state=420, spread=1, verbose=1)
Construct fuzzy simplicial set
Tue May 18 08:27:28 2021 Finding Nearest Neighbors
Tue May 18 08:27:29 2021 Finished Nearest Neighbor Search
Disconnection_distance = 1 has removed 1628 edges.  This is not a problem as no vertices were disconnected.
Tue May 18 08:27:31 2021 Construct embedding
    completed  0  /  500 epochs
    completed  50  /  500 epochs
    completed  100  /  500 epochs
    completed  150  /  500 epochs
    completed  200  /  500 epochs
    completed  250  /  500 epochs
    completed  300  /  500 epochs
    completed  350  /  500 epochs
    completed  400  /  500 epochs
    completed  450  /  500 epochs
Tue May 18 08:27:32 2021 Finished embedding```
konstin commented 3 years ago

Sorry, this should now be fixed. You can try the fixed version with pip install -U "bio-embeddings[all] @ git+https://github.com/sacdallago/bio_embeddings.git"