prihoda commented 3 years ago

Hi all, I would like to extract the predicted output activation for basic secondary structure predictions, specifically for disorder predictions.

Could be easily added here:

https://github.com/sacdallago/bio_embeddings/blob/67f0e710a44e15486f88d08042fcbb3021c8d0c6/bio_embeddings/extract/basic/BasicAnnotationExtractor.py#L153-L157

Would you consider merging such a pull request, if I give it a shot? Shall I add it just for disorder, or also for the multiclass outputs? Should an output file always be created, or only when a configuration flag is enabled? What format should we use for the output file, csv where rows are individual residue predictions?

sacdallago commented 3 years ago

Hi @prihoda , thanks for the proposal :) I'd love to see more community effort into development of ~our~ everyones! tools, so yes: if you believe this is helpful, you are most welcome to contribute. Now to your questions:

Would you consider merging such a pull request, if I give it a shot?

Yes!

Shall I add it just for disorder, or also for the multiclass outputs?

I believe it makes sense to add it for all multiclass outputs, if it's not a major time investment, for you!

Should an output file always be created, or only when a configuration flag is enabled?

The function that you mention will most likely always return this new type of information. Probably a good approach to do so is to extend the BasicSecondaryStructureResult named tuple with new fields. In turn, you'd need to extend BasicExtractedAnnotations and adabt the get_annotations function slightly. I believe this may be a more advanced use-case than most of the users of the pipeline would need, so I would suggest to create a parameter in the configuration file which defaults to false and is called something along the lines of get_activations.

What format should we use for the output file, csv where rows are individual residue predictions?

Yes, I think a CSV should work. We had a rather lengthy discussion internally about how to do this for mutagensis prediction, you may want to get inspired by that approach:

On closing: feel free to open the PR early and ask me out when needed!

kvetab commented 3 years ago

Hi @sacdallago, following your tips from above, I created a PR. I'm sure it will need some work, please feel free to comment on anything or contribute! There was also a big merge (I guess I should have waited for your "Big refactoring" to be finished :), so I hope I didn't break anything in the process.

sacdallago / bio_embeddings

Getting basic disorder prediction scores #153

168