Closed Sann5 closed 5 months ago
@misialq, would you be able to review this one?
Update: @colinvwood is going to try to take a pass through this and merge today, so it's in the prepare
this week.
Hey @misialq, I didn't have time to look at this today. If you have time tomorrow to look at this then just let me know otherwise I'll plan to look at it tomorrow. Excuse all the pings 🥸
Hey @gregcaporaso, @colinvwood - sure thing, I already had a glance - there are some significant changes which I proposed to @Sann5 so please do not review yet - the contents will likely change. We'll ping you when ready, thanks! 🙏
Good to know, thanks @misialq. I converted this to a Draft pull request. Since we have the release next week, I'm going to bump this to the project board for the next release - let us know if it'll be an issue to not have this in 2024.5.
Hey @gregcaporaso, thanks! No, I don't think it's an issue if we don't have it in 2024.5. We will probably want to test it out a bit together with our new moshpit action for eggnog so we may need some more time anyway :)
Hey @Sann5, what's up with the two failing tests?
Hey @Sann5, what's up with the two failing tests?
@misialq I opened an issue in phammer complaining how the error thrown when loading a file with mixed profiles (DNA, RNA, Protein) was uninformative. They already fixed it and pushed the patch to conda. I will update the error handling accordingly here.
Hey @Sann5, LGTM, thanks! If it's not too much trouble, do you think you could attach here this nice table you presented once in our meeting - it may be helpful in understanding what all the formats do 🙏
@lizgehret do you think you could check this out? :)
Sure thing!
The way they are usually used is:
One can also use profile HMMs to do sequence annotation or alignment.
Profile HMMs are different for different sequence types (e.g. DNA, RNA, and protein). Moreover, HMMER, the go-to software for biological sequence analysis with profile HMMs, saves profiles as text (or binary) files. One file can contain one or more profiles, each representing a group of sequences. However, no valid file can have profiles from more than one sequence type. Files with multiple profiles will be used to run some programs in HMMER while files with a single profile can run other programs.
To accommodate the different things that these profiles represent as well as the future use cases, this PR proposed the following semantic types.
Protein | DNA | RNA | |
---|---|---|---|
Single Profile | ProfileHMM[SingleProtein] | ProfileHMM[SingleDNA] | ProfileHMM[SingleRNA] |
Multiple Profiles | ProfileHMM[MultipleProtein] | ProfileHMM[MultipleDNA] | ProfileHMM[MultipleRNA] |
Multiple Profiles in Binary + Indexed | ProfileHMM[PressedProtein] | ProfileHMM[PressedDNA] | ProfileHMM[PressedRNA] |
Closes #327.
Adds new semantic types for profile hidden markov models as implemented in the HMMER + tests and test data.