jglamine / phamdb

Quickly create and modify Phamerator databases.
GNU General Public License v3.0
5 stars 5 forks source link

Phamerator workflow changes #13

Open chg60 opened 5 years ago

chg60 commented 5 years ago

I realize you probably don't maintain this code base much anymore... however, if you do...

I am a graduate student in Graham Hatfull’s lab at the University of Pittsburgh. Because PhamDB makes use of the Phamerator database scheme and parts of our workflow, I wanted to let you know about some changes we’re in the process of making to the Phamerator workflow.

The most important change is that we are ceasing use of kClust and HHsuite in favor of the newer, faster (still Söding-lab produced) MMseqs2. Our testing suggests that this program produces better phams than the iterative kClust approach, in a fraction of the time.

We will also be adding at least one column to the pham table. This column would contain the pham’s conservation score, as assessed by generating a Clustal Omega alignment and dividing the number of perfectly conserved residues by the length of the shortest gene in the pham (this means of calculating prevents draft-status genes, which are frequently called shorter than their manually annotated peers, from artificially lowering the conservation score of the pham). Of course the inclusion of this column means we will also be generating Clustal Omega alignments for each pham moving forward.

I don’t think it’s worth your time to worry about including the Clustal Omega portion of our modified workflow in the PhamDB workflow. However, given that MMseqs2 and kClust work quite differently and result in the production of quite different overall databases, it may be worth it for you or somebody else to update PhamDB to use MMseqs2 instead of kClust. This would likely result in less confusion downstream for Phamerator users trying to build their own databases and wondering why they’re not compatible with https://www.phamerator.org/ or why they can’t get their database clustered as well as we now can.

If you’d like more information about any of these changes or how they may impact you or your users, feel free to reach out.

-Christian Gauthier (christian.gauthier@pitt.edu)

jglamine commented 5 years ago

Thanks for the update. You're right that I don't maintain this anymore, but if you find someone who wants to work on it, I'm happy to give them access to the github repo.

On Tue, Mar 5, 2019, 10:31 AM chg60 notifications@github.com wrote:

I realize you probably don't maintain this code base much anymore... however, if you do...

I am a graduate student in Graham Hatfull’s lab at the University of Pittsburgh. Because PhamDB makes use of the Phamerator database scheme and parts of our workflow, I wanted to let you know about some changes we’re in the process of making to the Phamerator workflow.

The most important change is that we are ceasing use of kClust and HHsuite in favor of the newer, faster (still Söding-lab produced) MMseqs2. Our testing suggests that this program produces better phams than the iterative kClust approach, in a fraction of the time.

We will also be adding at least one column to the pham table. This column would contain the pham’s conservation score, as assessed by generating a Clustal Omega alignment and dividing the number of perfectly conserved residues by the length of the shortest gene in the pham (this means of calculating prevents draft-status genes, which are frequently called shorter than their manually annotated peers, from artificially lowering the conservation score of the pham). Of course the inclusion of this column means we will also be generating Clustal Omega alignments for each pham moving forward.

I don’t think it’s worth your time to worry about including the Clustal Omega portion of our modified workflow in the PhamDB workflow. However, given that MMseqs2 and kClust work quite differently and result in the production of quite different overall databases, it may be worth it for you or somebody else to update PhamDB to use MMseqs2 instead of kClust. This would likely result in less confusion downstream for Phamerator users trying to build their own databases and wondering why they’re not compatible with https://www.phamerator.org/ or why they can’t get their database clustered as well as we now can.

If you’d like more information about any of these changes or how they may impact you or your users, feel free to reach out.

-Christian Gauthier (christian.gauthier@pitt.edu)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jglamine/phamdb/issues/13, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0sE-SgS8pHlQgMlaeLemgxEc5GWknpks5vTrgXgaJpZM4bfPF8 .

chg60 commented 4 years ago

At long last... If your offer still stands there are a few of us who would be interested in trying to work on it. Even though it's 3 years out of date at this point, PhamDB remains the most popular way for people to construct databases for Phamerator, so it would be great to bring everybody under the same umbrella. Would you mind granting access to chg60 (myself) and laa89 (a tech in our lab who does really great programming)?

jglamine commented 4 years ago

Oh, cool! It's good to hear that people are still using this. Me from 2015 would have been very proud :)

I've sent collaborator invites to you and laa89. Looking forward to seeing what you do!

chg60 commented 4 years ago

The biggest goal is update it so that it's using faster/current tools and future-proof it a bit. With that in mind, I was planning to update the base image (ncbi-cdd:wheezy), so it uses the current stable Debian version (buster) instead of the now obsolete wheezy. This of course has impacts on package versions in the main Dockerfile, but I'll get there next. For now, when I try running "docker build ." using phamdb/pham/data/conserved-domain-database/Dockerfile as the build file, it's failing at the add Cdd steps. I don't want to take up too much of your time, but I wonder whether it would be relatively straightforward for you to tell me how you built the ncbi-cdd:wheezy base image?

chg60 commented 4 years ago

Nevermind, just figured it out...

jglamine commented 4 years ago

Lol 10 minutes later. Great glad I could help 😉

On Thu, Jul 9, 2020, 11:41 AM chg60 notifications@github.com wrote:

Nevermind, just figured it out...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jglamine/phamdb/issues/13#issuecomment-656201372, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGSYE2K6QGJ7BJDYSDLLHLR2XQI7ANCNFSM4G346F6A .

owebn commented 4 years ago

Hello! My name is Lawrence Abad and I am a lab technician in the same lab as Christian Gauthier. I was wondering if you'd be okay with giving permissions to one more person in our lab who we've brought on to try and assist us? Her username should be ananya-venbakkam.

My email is laa89@pitt.edu if you'd like to vet either Ananya or I or if you were still holding on to any questions.

Thank you for your assistance so far!

chg60 commented 3 years ago

Hi James,

Hope all is well!

When we updated PhamDB last year, we didn't think about the fact that it would be important to push a fresh copy of the container to DockerHub. When you get a chance, could you please build and push the new container into your DockerHub so that folks who pull from there get the latest version? Seems lately some folks have been having difficulty with the image currently on DockerHub and my guess is this is why...

Many thanks!

-Christian

jglamine commented 3 years ago

A good long term solution for this would be for you to set it up with a service like Circle CI or Travis CI so that it automatically builds and pushes new versions to dockerhub when the github repo is updated. I don't have time to add this feature, but if you'd like to set it up I'm happy to merge it. I'd rather not build and push the new image manually just because I don't want to commit to doing this in the future.

My recommendation would be for you to change the dockerhub image to an account which you own and then update the documentation to point towards that, rather than continuing to use my personal dockerhub account. Any existing users would have to update to pull from the new dockerhub account.

If you'd rather keep it on my dockerhub account so that existing users don't have to change anything to pull the new version, we can do that. If you set up Circle CI or Travis and just need me to give you some sort of dockerhub API token for Travis/Circle to use, I'm happy to do that.

On Fri, May 28, 2021 at 1:11 AM chg60 @.***> wrote:

Hi James,

Hope all is well!

When we updated PhamDB last year, we didn't think about the fact that it would be important to push a fresh copy of the container to DockerHub. When you get a chance, could you please build and push the new container into your DockerHub so that folks who pull from there get the latest version? Seems lately some folks have been having difficulty with the image currently on DockerHub and my guess is this is why...

Many thanks!

-Christian

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jglamine/phamdb/issues/13#issuecomment-850343009, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGSYEYRLEG4EHWDFKKLEHTTP526ZANCNFSM4G346F6A .