salilab / bayesian_hdx

Support code for HDX data reduction
GNU Lesser General Public License v2.1
3 stars 3 forks source link

how to create input for the script #4

Closed Meravbraitbard closed 5 years ago

Meravbraitbard commented 5 years ago

DHDDS_results_HDX-Mar19_percD.txt Hi, I am working on the first time with HDX data and i tried to use the HDXWorkbench example script. but the data I have looks a little bit different from the input in the example (attached). Is there a way to fit my data to the script? Thanks!

saltzberg commented 5 years ago

In theory, yes, we can use this data, however it is not a format that I have seen before. What is the value inside the parentheses, e.g.: (1-1), (2-2)? Also, is the DHDD-S column the %D value and the standard deviation?

Is this a standard format (straight from HDX/MS analysis software)? If so, I am happy to create a function in the code to handle this format of data.

Otherwise, the minimal format, HDX_columns, requires is of the following format:

sequence, start_res, end_res, time, D_inc
AEDMAADEVTAPPRKVLIISAGASHSVAL, 2, 30, 10.0, 15.20
Meravbraitbard commented 5 years ago

Hi again, I tried to create a script and change the input that will be like the required input. and I am attaching the new input. but, it's not working, what is missing? I am attaching also the script I am using. Thanks

DHDDS_HDX_processing.zip

Meravbraitbard commented 5 years ago

Hi again, I tried to create a script and change the input that will be like the required input. and I am attaching the new input. but, it's not working, what is missing? I am attaching also the script I am using. Thanks

DHDDS_HDX_processing.zip

saltzberg commented 5 years ago

Hi. The HDXWorkbench macro is only useful for data originating from HDXWorkbench. It is a very specific format that contains two different protein states.

To work with your HDX data, remove the entire header from the data file and read it with the hxio.import_HXcolumns() parser. The values from the header values can be inputted in the running script (shown below). NOTE, I made a mistake in my above comment and the sequence column should be named peptide_seq.

New files are here: hdx_files.zip

So this was your original data file (truncated):

experiment attribute,value
Experiment name,DHDDS_HDX
Offset,0
Deuterium solution concentration,0.9
Experiment Protein Sequence, GSGSGSGMSWIKEGELSLWERFCANIIKAGPMPKHIAFIMDGNRRYAKKCQVERQEGHSQGFNKLAETLRWCLNLGILEVTVYAFSIENFKRSKSEVDGLMDLARQKFSRLMEEKEKLQKHGVCIRVLGDLHLLPLDLQELIAQAVQATKNYNKCFLNVCFAYTSRHEISNAVREMAWGVEQGLLDPSDISESLLDKCLYTNRSPHPDILIRTSGEVRLSDFLLWQTSHSCLVFQPVLWPEYTFWNLFEAILQFQMNHSVLQKARDMYAEERKRQQLERDQATVTEQLLREGLQASGDAQLRRTRLHKLSARREERVQGFLQALELKRADWLARLGTASA

sequence,start_res,end_res,time,D_inc
GSGSGSGMSW,1,10,0,0.00

Instead, the file should simply be:

peptide_seq,start_res,end_res,time,D_inc
GSGSGSGMSW,1,10,0,0.00

To input this data into the method, the lines in the running script should look like this:

inseq = 'GSGSGSGMSWIKEGELSLWERFCANIIKAGPMPKHIAFIMDGNRRYAKKCQVERQEGHSQGFNKLAETLRWCLNLGILEVTVYAFSIENFKRSKSEVDGLMDLARQKFSRLMEEKEKLQKHGVCIRVLGDLHLLPLDLQELIAQAVQATKNYNKCFLNVCFAYTSRHEISNAVREMAWGVEQGLLDPSDISESLLDKCLYTNRSPHPDILIRTSGEVRLSDFLLWQTSHSCLVFQPVLWPEYTFWNLFEAILQFQMNHSVLQKARDMYAEERKRQQLERDQATVTEQLLREGLQASGDAQLRRTRLHKLSARREERVQGFLQALELKRADWLARLGTASA'

# Initialize model
sys = system.System(output_dir=outputdir, noclobber=False)
mol = sys.add_macromolecule(inseq, "Molecule")
# Import data
dataset = hxio.import_HXcolumns(workbench_file,  # path to the file
                              inseq,         # Target sequence
                              name="Data",     # A name in case you have many data files
                              percentD=True,   # Percent rather than absolute deuterium 
                              conditions=None, # Standard conditions (293K, pH=7.0)
                              error_estimate=5.0, # A guess on the error in %D units
                              n_fastamides=2, #Number of fast exchanging amides that are disregarded in each peptide
                              offset=0) #Sequence and peptide numbering offset

# Each molecule has one state already defined, the Apo.
state = mol.get_apo_state()

# Add the dataset to this state.
state.add_dataset(dataset)

# Define the HDX representation model. ResidueGridModel models the protection factor at each residue along a finite grid.
output_model = state.set_output_model(model.ResidueGridModel(state, grid_size=num_exp_bins))

# Initialize output
sys.output.initialize_output_model_file(state, output_model.pf_grids)

The updated files are contained here. I have also changed the annealing protocol in this script to a new one that is significantly superior to the original method.