Open prihoda opened 3 years ago
Hi @prihoda , thanks for the proposal :) I'd love to see more community effort into development of ~our~ everyones! tools, so yes: if you believe this is helpful, you are most welcome to contribute. Now to your questions:
Would you consider merging such a pull request, if I give it a shot?
Yes!
Shall I add it just for disorder, or also for the multiclass outputs?
I believe it makes sense to add it for all multiclass outputs, if it's not a major time investment, for you!
Should an output file always be created, or only when a configuration flag is enabled?
The function that you mention will most likely always return this new type of information. Probably a good approach to do so is to extend the BasicSecondaryStructureResult
named tuple with new fields. In turn, you'd need to extend BasicExtractedAnnotations
and adabt the get_annotations
function slightly. I believe this may be a more advanced use-case than most of the users of the pipeline would need, so I would suggest to create a parameter in the configuration file which defaults to false and is called something along the lines of get_activations
.
What format should we use for the output file, csv where rows are individual residue predictions?
Yes, I think a CSV should work. We had a rather lengthy discussion internally about how to do this for mutagensis prediction, you may want to get inspired by that approach:
On closing: feel free to open the PR early and ask me out when needed!
Hi @sacdallago, following your tips from above, I created a PR. I'm sure it will need some work, please feel free to comment on anything or contribute! There was also a big merge (I guess I should have waited for your "Big refactoring" to be finished :), so I hope I didn't break anything in the process.
Hi all, I would like to extract the predicted output activation for basic secondary structure predictions, specifically for disorder predictions.
Could be easily added here:
https://github.com/sacdallago/bio_embeddings/blob/67f0e710a44e15486f88d08042fcbb3021c8d0c6/bio_embeddings/extract/basic/BasicAnnotationExtractor.py#L153-L157
Would you consider merging such a pull request, if I give it a shot? Shall I add it just for disorder, or also for the multiclass outputs? Should an output file always be created, or only when a configuration flag is enabled? What format should we use for the output file, csv where rows are individual residue predictions?