wells-wood-research / timed-design

Protein Sequence Design with Deep Learning and Tooling like Monte Carlo Sampling and Analysis
46 stars 11 forks source link

Multichainfix #81

Closed sunal1996 closed 4 months ago

sunal1996 commented 4 months ago

At the moment, if we want to predict sequences for a multi chain structure, TIMED merges the predictions for the individual chains and outputs a single chained fasta file.

For instance, for a protein containing 184 aa in chain A and 184 aa in chain B, we would like to obtain a fasta file that contains the prediction for chain A with 184 aa and chain B with 184. However, at the moment, we are only getting a chain A as output and the length is 368. This PR aims to fix that.

sunal1996 commented 4 months ago

It looked like I had to change the variable "pdb" to "pdb_chain" in these conditions as well. Also, sorry about the duplicate commit names. Second "multi chain output format" commit refers to the version change and deletion of the comment you suggested.

        if old_datasetmap:
            pdb_chain, chain, _, res = flat_dataset_map[i]
            count = 1
        else:
            pdb_chain, count = flat_dataset_map[i]
            count = int(count)
universvm commented 4 months ago

Hey Mert (@sunal1996 ) LGTM - I will test it out hopefully tomorrow to see if the UI works. Does your example work with the current code?

sunal1996 commented 4 months ago

Hey Mert (@sunal1996 ) LGTM - I will test it out hopefully tomorrow to see if the UI works. Does your example work with the current code?

Yes, it did work for 3w8o + 3qy1 put together. Both are two chain proteins

sunal1996 commented 4 months ago

To see if customization of pdb file names effect the code, I changed 3w8o.pdb to baba123_dubu_dib_co_212.pdb and it worked

universvm commented 4 months ago

Confirmed bug was fixed and different chains are displayed separately as they should. Had to modify the UI slighly

Screenshot 2024-05-06 at 15 31 17 Screenshot 2024-05-06 at 15 31 04