sacdallago / bio_embeddings

Get protein embeddings from protein sequences
http://docs.bioembeddings.com
MIT License
463 stars 65 forks source link

Get activations #168

Closed kvetab closed 3 years ago

kvetab commented 3 years ago

After discussing with David (@prihoda), I'm creating a PR based on the issue #153 .

Enable returning files with activations for secondary structure and disorder predictions. Config file can contain parameter get_activations, which controls whether these files are created. Use softmax on raw activations tensors (yhat_...) to obtain "probabilities" for each of the classes. Return these from the get_secondary_structure function in BasicAnnotationExtractor, store them in BasicSecondaryStructureResult and subsequently in BasicExtractedAnnotations, which is returned by get_annotations. In predict_annotations_using_basic_models, check for get_activations parameter; if true, create output files for activations from DSSP3, DSSP8 and disorder. Then, store activations into dataframes along with the sequence ID and residue number. Each row corresponds to one position in one sequence, columns correspond to sequence ID, residue number (1-based) and all the possible classes. For each prediction type (DSSP3, DSSP8 and disorder), concatenate all dataframes and print to the corresponding csv file.

Looking forward to any comments, especially on the use of the config file, I'm a bit fuzzy on that.

sacdallago commented 3 years ago

Thanks @kvetab . Sorry for the delay, somehow this didn't show up on my notifications 😮 I'll put it on the todo for this week :)

konstin commented 3 years ago

Thank you!

kvetab commented 3 years ago

Hi @sacdallago, @konstin, I'd like to ask, how often / in which situations do you make new releases? It would be really cool if you could make one sometime soon, as I'd like to use the contents of my PR in a project without having to install from the develop branch :) Thanks