Closed blacktanktop closed 2 years ago
Hi! So you can create your own FASST database file using a program in Mosaist (fasstDB
). This program requires that you have a list of PDB files that you'd like to include in the database, ideally these have been pruned for homology. In the supplement we provide the exact list of structures that we used in the single and multichain databases. If you go this route, be sure to use the --s
/--c
arguments when building the DB. Within the next few days I can add a section to the readme explaining how to do this, step-by-step.
Since that's pretty involved, I'm also going to see about making the databases that we used in the paper available for download. I'm currently looking into the github lfs option. I'll let you know when the file is ready for download.
Thanks for bringing this and the compilation error to my attention!
Thanks for the reply. I read carefully at the supplemental material, and I could confirm that there is a command in Mosaist to create a FASST database called fasstDB. I didn't read it carefully.
However, the list of PDBs used in the paper, which is probably necessary to create the FASST database, is probably listed in the following table, but I cannot find it on the paper's web page. (Table S1-S3 are there, but Table S4-S7 do not appear to be uploaded.)
It certainly looks a bit difficult to create this FASST database, so I look forward to downloading the database you used in your paper!
Huh, well thanks for pointing that out, I'll reach out to Protein Science and have them add the missing supplementary tables.
I was able to upload the smallest database. You can find it at testfiles/singlechain_22188_sim30_STRIDE.db
. This should work well for generating seeds. It will also work for scoring the interface, but it's not quite ideal for that. I'll need to find another way to host larger files to make the multichain database available too.
I would like to perform scoring, but the config setting for example is multichainDB. Does this change the meaning of scoring if it is a singlechain (sorry for not well understanding the paper)? If it has to be multichainDB, would it be possible to upload the multichainDB somewhere as well as the singlechainDB?
Hi, I was able to upload the singlechain/multichain DB files to zenodo. Would you mind downloading both and seeing if they work for you? https://zenodo.org/record/6569429
While I haven't benchmarked it, I do suspect that it's better to score interfaces with the multichain DB. The singlechain DB splits biological units apart, which could have an influence on statistics.
Hi Sebastian! Really interesting paper. Following up on the previous question, if I understood correctly the results of the paper are based on the singlechainDB. So, I was wondering why in the run_scoreStructures.sh example looks for the multichainDB file. You expect to work better? This is just to ensure the proper way of testing your code.
Thanks!
Hi Enrique!
The singlechainDB is used for generating "interface seeds" which are combined to construct the peptide backbones. When it comes to scoring the interfaces, we opted to use the multichainDB. I just looked back at the paper, and this is only really mentioned in the methods, so I can see how that is confusing.
The TERM interface score works by comparing the probability of the amino acid on the surface of the target protein in the context of a "pair fragment" describing the interface, and a "self fragment" describing just the surface of the protein (which acts as the reference state). The benefit of using the multichainDB is that it will include more interface structures that could be matches to the peptide-protein interface that is being scored. The degree to which this helps is not clear, I haven't benchmarked how the score changes when using the singlechainDB.
Thanks for the clarification! Just downloaded the multichainDB and will test it on our system.
Seems like the databases I uploaded to Zenodo are working. Feel free to reopen this discussion if you run into any issues.
Hi! I want to run the main pipeline, but I don't know how to create the FASST file described in the main pipeline. How do I create the
singlechain_22188_sim30.db
ormultichain_23643_sim50.db
described in the config file inpeptide_design/example/input_files/
? Could you please give me the inputs and the procedure?