[FEATURE REQUEST] bond_analysis.py

supers1x commented 2 years ago

The bond_analysis.py script only outputs the correct bond occupancy in the JSON file. Will it be possible to save total/incorrect bond occupancy in the same JSON file?

RodenLuo commented 2 years ago

Hi,

I have written a script to categorize the bond status of each NT for each Frame. It will generate a file in a format similar to the trajectory format of oxDNA. From this output, several ways of downstream analysis can be followed, such as summing up all frames or getting averaged statistics. I attach the code (in the same license as oxDNA, GPL-3.0 license) and the necessary info below. Hope it helps.

We (the KAUST NANOVIS group) have developed a visualization tool in Houdini and also in standalone C++ to visualize the bond status in each frame as well as the PCA result for an oxDNA trajectory in various abstract layouts. See the Advertisement for SynopSpace section below for both the static figure and mp4. The tool is in the alpha-testing phase now. Feel free to reach out to deng.luo[AT]kaust.edu.sa if you are interested in seeing your structures/trajectories inside it.

Code: oxDNA bond status.zip

Output

Click me to see the details

An example output ``` t = 1000000 -2 -1 0 1 2 7923 ... ``` Output format: ``` t = [Pair ID if code == 2] # for NT0 [Pair ID if code == 2] # for NT1 [Pair ID if code == 2] # for NT2 ... ``` So, there are 5 categories. The meaning of them are as follows. `hb` stands for hydrogen-bonded. ``` -2: Designed to be hb, wrong hb -1: Designed to be singled, but is hb 0: Designed to be hb, not yet formed 1: Designed to be singled, and is single 2: Designed to be hb, correctly formed. In this case, the pair id will be printed as well. ```

Commands to run the code:

Click me to see the details

``` python convert_hb_to_nt_status.py design.top design.ordered.pair.txt hb_status.txt hb_status.dat # design.top is oxDNA topology file # hb_status.dat is the output # The rest is explained below ``` `hb_status.txt ` is the output of DNAnalysis if you add the following to `input` and invoke `DNAnalysis input` ``` #### Analysis #### analysis_data_output_1 = { name = hb_status.txt print_every = 1 col_1 = { type = hb_list } } ``` `design.ordered.pair.txt`, the format is the same as `hb_status.txt` but with only one frame. And it records the designed hb status. Example below. ``` # step 0 0 7923 1 7922 2 7921 3 7920 4 7919 ... ``` To generate `design.ordered.pair.txt`, there are several ways. 1. If you have cadnano or CanDo json file, download tacoxDNA, replace 3 files with the ones attached in the code. Invoke the conversion command will generate `.ordered.pair.txt` automatically. 2. If the above is not the case, choose one step from `DNAnalysis` output and add to the designed status manually. (Sometimes `DNAnalysis` failed to capture a few base pairs.) 3. Use oxview. Drag & drop the design to oxview. Enable `strand` in Selection mode, and enable `Select pairs`. Select the scaffold. Download `Selected base list`. `sed 's/ /\n/2; P; D' baseListFile.txt > basePairList.txt` `awk '$1 > $2 {temp = $2; $2 = $1; $1 = temp} 1' basePairList.txt > basePairList.ordered.txt` Add header `# step 0` at the first line manually.

Advertisement for SynopSpace

Click me to see the details

![image](https://user-images.githubusercontent.com/8460424/195042851-f309cfc6-979c-4b67-b30f-4747b5910504.png) ![image](https://user-images.githubusercontent.com/8460424/195042930-6e61a5c8-efc9-46c7-897a-362fe264be23.png) https://user-images.githubusercontent.com/8460424/195048939-b231b491-f24b-4da1-aa83-bd51b07104d9.mp4

supers1x commented 2 years ago

RodenLuo, thanks for the support. The script does contain the functions that are helpful. All I need is to build my workflow based on the hb_status.dat file. However, when dealing with disordered hb_status.txt (the DNAnalysis is to blame), the script gives wrong results compared to design.ordered.pair.txt. Thus, it is also important to print base ID in the output. Maybe my code could help to solve the problem. code.zip

RodenLuo commented 2 years ago

So, first, you will need to add one line such as # step 0 or t = 0, or anything else as the first line in your pairs.txt. (This is because I initially want to keep some format consistency to have this header. During parsing, in line 28 of convert_hb_to_nt_status.py, it is skipping the first line.)

Then, this corner case (with only one frame) actually reveals a bug. There is a code section to get the nt status code "1" correct. But this section has not been executed in this corner case. I have updated the code to correct it: convert_hb_to_nt_status.zip.

Now the output looks as follow:

t = 3000000
1
0
2 250
2 249
...

Keep in mind that oxDNA NT index starts at 0 (and strand index starts at 1). This means at this frame, NT0 is designed to be single and is single, NT1 is designed to be paired and not yet formed, NT2 is designed to be paired with 250 and is correctly formed, the same for NT3 with 249.

PS: I have the feeling that you manually come up with the pairs.txt, which is fine if you don't have a big structure. Just make sure that in each line, the smaller ID always goes first. The awk command mentioned in my previous reply guarantees this.

PPS: I think understand what you mean by "disordered". It has been taken into account. In short, construct_hb_design function builds the hb_design_dict first. Then hb_frame_list with the same length as the total NT is initialized with all 0s. And then when parsing each line of the "disordered" output, hb_frame_list is updated at that specific index. In the end, a final go through of hb_frame_list is there to identify the case for status code = 1. So, even if the output is disordered, the script runs fine.

zoombya commented 2 years ago

My take on this problem is to write my own function counting the number of correct bonds in this case.

# pairs is a dictionary of following form 
#   {lower_base_id : bound_base_id}
# it's importaint to have the id with the lower index come first in those pairs.  

def analyze_bonds(path, pairs):
    import oxpy
    # collect the states
    n_states = []
    # we need the context as it otherwise complains about the logger
    with oxpy.Context():
        # get our simulation from the path 
        s = Simulation(f"{path}/input")
        # add the observable
        s.input_file["analysis_data_output_1"] = '{ \n name = stdout \n print_every = 1e10 \n col_1 = { \n id = my_obs \n type = pair_energy \n } \n }'

        # get oxDNA Analysis instead of the normal backend
        backend = oxpy.analysis.AnalysisBackend(
            s.input_file
        )

        # fire the analysis off
        for i in range(3000): # i am interested only in a subset of the trajectory 
            # read in a conf to analyze
            backend.read_next_configuration()
            # split the observable output 
            e_txt = backend.config_info().get_observable_by_id("my_obs").get_output_string(0).strip().split('\n')

            # accumulate the number of bonds => state
            state = 0
            for line in e_txt:
                if line[0] != "#": # filter out the header and footer
                    # split output into calculated parameters
                    id1, id2, FENE, BEXC, STCK, NEXC, HB, CRSTCK, CXSTCK, DH, total = map(float,line.split())
                    # figure out if pair ids are in the observable output and if we have a HB
                    if id1 in pairs and id2 == pairs[id1] and HB :
                        state +=1
            # record the state
            n_states.append(state)

        #output stuff
        with open(join(path,"bond_states.json"),"w") as file:
            file.write(
                dumps(n_states)
            )

This function also uses a peace of my boiler plate simulation class, which I am actively fiddling with. Attached is a link to the prototype.

Biggest problem with the observable is that it will output all the bonds in the system, regardless if they are wrong or not, so you have to maintain some sort of a list as @RodenLuo mentioned.

ox_boilerplate2.zip

Prateek1410 commented 2 years ago

Thanks for the help, RodenLuo. I followed your instructions and ran the codes for a small system of dsDNA (40nt long). I can't seem to understand the outputted hb_status.dat file: looking at the trajectory, I believe all pairs are intact throughout and thus, the status should be '2' however, this is the case for only 1 base pair: 20 and 59 at all frames. I am attaching the files here; could you please take a look? Thanks again! I converted all to .txt files cause otherwise it wasn't accepting them.

input.txt last_conf.txt trajectory.txt baseListFile.txt

hb_status_output.txt

basePairList.ordered.txt basePairList.txt dsdna.txt dsdna_conf.txt hb_status.txt

RodenLuo commented 2 years ago

Hi, How come you have index 84 in your baseListFile.txt and all the related files, given that your top file has only 80 NTs?

I guess you have edited the structure after you downloaded the baseListFile.txt, or you messed up two structures here. I repeated everything using your dsdna.txt and dsdna_conf.txt and the output looked fine. It generated this basePairList.ordered.txt. With it, you should be able to get the correct result.

RodenLuo commented 2 years ago

Oh, the chance is very low. But you might hit this bug as I did before... https://github.com/sulcgroup/oxdna-viewer/issues/87

I played in oxview with your structure, the baseListFile.txt content order sometimes changes. But the pairs are correct.

Prateek1410 commented 2 years ago

It seems the issue is with the awk command not working, weirdly. Thus, the ordering of the particle ids doesn't happen.

I don't think I hit that bug; I simply selected all particles and then downloaded the selected baselistfile on which I successfully ran the sed command. However, after running awk, I don't see any changes in the generated ordered file.

RodenLuo commented 2 years ago

The awk command is to guarantee the first column is always smaller than the second column. If it is already the case, the awk command will simply do nothing.

What I don't get is, in your original baseListFile.txt, it starts with "45 84 ...". How is it possible that you have an "84" there, as you only have 80 NTs in the whole system.

Have you tried the new basePairList.ordered.txt I uploaded above?

Prateek1410 commented 2 years ago

Hi Roden,

Thanks, I got it now. I don't know how those extra NTs came up though. The problem lay in my faulty selection which wasn't giving me the correct pairs. After correct selection I am getting the expected result, thanks again!

To clarify: in every frame, each particle's bond status is given right?

How should I use the script if I am interested in the bond status of only a few particles? Is there a way, or would I have to generate the hb_status.dat file for the entire structure and then process it further to study my chosen particles?

Thanks and regards Prateek

RodenLuo commented 2 years ago

Hi Prateek, Great to know it's working.

in every frame, each particle's bond status is given right?

Yes. There is a detailed doc in my first reply on this page.

to study my chosen particles?

When I first wrote the script, I tried to code it as general as possible. Then, any downstream analysis can go from here and filter the output to get the case-specific results. For example, if you are only interested in the first NT, then grabbing the first line of each frame would do. Either manually in a text/Excel editor or a bit of learning on Python scripting about file parsing and array indexing should be good enough for your case.

Cheers, Roden

Prateek1410 commented 2 years ago

Okay, got it, thanks! This was very helpful.

RodenLuo commented 2 years ago

You are very welcome! Just a final heads-up, I forgot to mention just now. oxDNA's NT indexing starts with 0. So, NT 0 is the first line under each frame.

Prateek1410 commented 2 years ago

Okay, thanks.

lorenzo-rovigatti commented 2 years ago

Hey guys, I didn't manage to follow the whole thread, but is the feature request still active? In other words, is this something you think we should somehow incorporate in the code or not?

Prateek1410 commented 2 years ago

Hello Professor Lorenzo

Yes, it'd be fantastic if you could please rework the script to produce info on missbonds, native bonds for each configuration.

Warm Regards Prateek

supers1x commented 2 years ago

Hello Professor Lorenzo, Just as Prateek said, a function that outputs misbonds, desined-bonds would be appreciated.

ErikPoppleton commented 2 years ago

Hey guys, sorry for the slow responses, in the middle of moving to a new positions so things are a bit hectic. That being said, a few months ago I rewrote all the scripts in oat to be composed of cleanly importable functions for use in your own Python scripts. oxDNA_analysis_tools.bond_analysis.bond_analysis might be exactly what you're looking for. If you need it on a per-configuration basis rather than an average over configurations, feel free to copy the function and just change the data accumulation.

ErikPoppleton commented 1 year ago

Update on this, I recently reworked oxDNA_analysis_tools.bond_analysis.bond_analysis to have per-configuration output. The CLI script now also includes a plot which shows the number of bonds over time, which I find very useful for equilibrating structures where there are a lot of bonds I need to form.

bonds

Prateek1410 commented 1 year ago

Thanks a lot, Erik. This is super helpful.

On Tue, 20 Dec, 2022, 9:00 pm Erik Poppleton, @.***> wrote:

Update on this, I recently reworked oxDNA_analysis_tools.bond_analysis.bond_analysis https://lorenzo-rovigatti.github.io/oxDNA/oat/api.html#bond-analysis to have per-configuration output. The CLI script now also includes a plot which shows the number of bonds over time, which I find very useful for equilibrating structures where there are a lot of bonds I need to form.

[image: bonds] https://user-images.githubusercontent.com/36451772/208703953-c1d697a9-290b-4e2e-9b9d-e73578318c95.png

— Reply to this email directly, view it on GitHub https://github.com/lorenzo-rovigatti/oxDNA/issues/45#issuecomment-1359567593, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARS5VAZMINFHXAFEIEDCSTTWOHGIZANCNFSM6AAAAAARB3RWGI . You are receiving this because you commented.Message ID: @.***>

lorenzo-rovigatti / oxDNA