Closed supers1x closed 1 year ago
Hi,
I have written a script to categorize the bond status of each NT for each Frame. It will generate a file in a format similar to the trajectory format of oxDNA. From this output, several ways of downstream analysis can be followed, such as summing up all frames or getting averaged statistics. I attach the code (in the same license as oxDNA, GPL-3.0 license) and the necessary info below. Hope it helps.
We (the KAUST NANOVIS group) have developed a visualization tool in Houdini and also in standalone C++ to visualize the bond status in each frame as well as the PCA result for an oxDNA trajectory in various abstract layouts. See the Advertisement for SynopSpace section below for both the static figure and mp4. The tool is in the alpha-testing phase now. Feel free to reach out to deng.luo[AT]kaust.edu.sa if you are interested in seeing your structures/trajectories inside it.
Code: oxDNA bond status.zip
RodenLuo, thanks for the support. The script does contain the functions that are helpful. All I need is to build my workflow based on the hb_status.dat
file.
However, when dealing with disordered hb_status.txt
(the DNAnalysis
is to blame), the script gives wrong results compared to design.ordered.pair.txt
. Thus, it is also important to print base ID in the output.
Maybe my code could help to solve the problem.
code.zip
So, first, you will need to add one line such as # step 0
or t = 0
, or anything else as the first line in your pairs.txt
. (This is because I initially want to keep some format consistency to have this header. During parsing, in line 28 of convert_hb_to_nt_status.py
, it is skipping the first line.)
Then, this corner case (with only one frame) actually reveals a bug. There is a code section to get the nt status code "1" correct. But this section has not been executed in this corner case. I have updated the code to correct it: convert_hb_to_nt_status.zip.
Now the output looks as follow:
t = 3000000
1
0
2 250
2 249
...
Keep in mind that oxDNA NT index starts at 0 (and strand index starts at 1). This means at this frame, NT0 is designed to be single and is single, NT1 is designed to be paired and not yet formed, NT2 is designed to be paired with 250 and is correctly formed, the same for NT3 with 249.
PS: I have the feeling that you manually come up with the pairs.txt
, which is fine if you don't have a big structure. Just make sure that in each line, the smaller ID always goes first. The awk
command mentioned in my previous reply guarantees this.
PPS: I think understand what you mean by "disordered". It has been taken into account. In short, construct_hb_design
function builds the hb_design_dict
first. Then hb_frame_list
with the same length as the total NT is initialized with all 0s. And then when parsing each line of the "disordered" output, hb_frame_list
is updated at that specific index. In the end, a final go through of hb_frame_list
is there to identify the case for status code = 1. So, even if the output is disordered, the script runs fine.
My take on this problem is to write my own function counting the number of correct bonds in this case.
# pairs is a dictionary of following form
# {lower_base_id : bound_base_id}
# it's importaint to have the id with the lower index come first in those pairs.
def analyze_bonds(path, pairs):
import oxpy
# collect the states
n_states = []
# we need the context as it otherwise complains about the logger
with oxpy.Context():
# get our simulation from the path
s = Simulation(f"{path}/input")
# add the observable
s.input_file["analysis_data_output_1"] = '{ \n name = stdout \n print_every = 1e10 \n col_1 = { \n id = my_obs \n type = pair_energy \n } \n }'
# get oxDNA Analysis instead of the normal backend
backend = oxpy.analysis.AnalysisBackend(
s.input_file
)
# fire the analysis off
for i in range(3000): # i am interested only in a subset of the trajectory
# read in a conf to analyze
backend.read_next_configuration()
# split the observable output
e_txt = backend.config_info().get_observable_by_id("my_obs").get_output_string(0).strip().split('\n')
# accumulate the number of bonds => state
state = 0
for line in e_txt:
if line[0] != "#": # filter out the header and footer
# split output into calculated parameters
id1, id2, FENE, BEXC, STCK, NEXC, HB, CRSTCK, CXSTCK, DH, total = map(float,line.split())
# figure out if pair ids are in the observable output and if we have a HB
if id1 in pairs and id2 == pairs[id1] and HB :
state +=1
# record the state
n_states.append(state)
#output stuff
with open(join(path,"bond_states.json"),"w") as file:
file.write(
dumps(n_states)
)
This function also uses a peace of my boiler plate simulation class, which I am actively fiddling with. Attached is a link to the prototype.
Biggest problem with the observable is that it will output all the bonds in the system, regardless if they are wrong or not, so you have to maintain some sort of a list as @RodenLuo mentioned.
Thanks for the help, RodenLuo. I followed your instructions and ran the codes for a small system of dsDNA (40nt long). I can't seem to understand the outputted hb_status.dat file: looking at the trajectory, I believe all pairs are intact throughout and thus, the status should be '2' however, this is the case for only 1 base pair: 20 and 59 at all frames. I am attaching the files here; could you please take a look? Thanks again! I converted all to .txt files cause otherwise it wasn't accepting them.
input.txt last_conf.txt trajectory.txt baseListFile.txt
basePairList.ordered.txt basePairList.txt dsdna.txt dsdna_conf.txt hb_status.txt
Hi, How come you have index 84
in your baseListFile.txt and all the related files, given that your top file has only 80 NTs?
I guess you have edited the structure after you downloaded the baseListFile.txt
, or you messed up two structures here. I repeated everything using your dsdna.txt and dsdna_conf.txt and the output looked fine. It generated this basePairList.ordered.txt. With it, you should be able to get the correct result.
Oh, the chance is very low. But you might hit this bug as I did before... https://github.com/sulcgroup/oxdna-viewer/issues/87
I played in oxview with your structure, the baseListFile.txt
content order sometimes changes. But the pairs are correct.
It seems the issue is with the awk command not working, weirdly. Thus, the ordering of the particle ids doesn't happen.
I don't think I hit that bug; I simply selected all particles and then downloaded the selected baselistfile on which I successfully ran the sed command. However, after running awk, I don't see any changes in the generated ordered file.
The awk command is to guarantee the first column is always smaller than the second column. If it is already the case, the awk command will simply do nothing.
What I don't get is, in your original baseListFile.txt, it starts with "45 84 ...". How is it possible that you have an "84" there, as you only have 80 NTs in the whole system.
Have you tried the new basePairList.ordered.txt I uploaded above?
Hi Roden,
Thanks, I got it now. I don't know how those extra NTs came up though. The problem lay in my faulty selection which wasn't giving me the correct pairs. After correct selection I am getting the expected result, thanks again!
To clarify: in every frame, each particle's bond status is given right?
How should I use the script if I am interested in the bond status of only a few particles? Is there a way, or would I have to generate the hb_status.dat file for the entire structure and then process it further to study my chosen particles?
Thanks and regards Prateek
Hi Prateek, Great to know it's working.
in every frame, each particle's bond status is given right?
Yes. There is a detailed doc in my first reply on this page.
to study my chosen particles?
When I first wrote the script, I tried to code it as general as possible. Then, any downstream analysis can go from here and filter the output to get the case-specific results. For example, if you are only interested in the first NT, then grabbing the first line of each frame would do. Either manually in a text/Excel editor or a bit of learning on Python scripting about file parsing and array indexing should be good enough for your case.
Cheers, Roden
Okay, got it, thanks! This was very helpful.
You are very welcome! Just a final heads-up, I forgot to mention just now. oxDNA's NT indexing starts with 0. So, NT 0 is the first line under each frame.
Okay, thanks.
Hey guys, I didn't manage to follow the whole thread, but is the feature request still active? In other words, is this something you think we should somehow incorporate in the code or not?
Hello Professor Lorenzo
Yes, it'd be fantastic if you could please rework the script to produce info on missbonds, native bonds for each configuration.
Warm Regards Prateek
Hello Professor Lorenzo, Just as Prateek said, a function that outputs misbonds, desined-bonds would be appreciated.
Hey guys, sorry for the slow responses, in the middle of moving to a new positions so things are a bit hectic. That being said, a few months ago I rewrote all the scripts in oat
to be composed of cleanly importable functions for use in your own Python scripts. oxDNA_analysis_tools.bond_analysis.bond_analysis
might be exactly what you're looking for. If you need it on a per-configuration basis rather than an average over configurations, feel free to copy the function and just change the data accumulation.
Update on this, I recently reworked oxDNA_analysis_tools.bond_analysis.bond_analysis
to have per-configuration output. The CLI script now also includes a plot which shows the number of bonds over time, which I find very useful for equilibrating structures where there are a lot of bonds I need to form.
Thanks a lot, Erik. This is super helpful.
On Tue, 20 Dec, 2022, 9:00 pm Erik Poppleton, @.***> wrote:
Update on this, I recently reworked oxDNA_analysis_tools.bond_analysis.bond_analysis https://lorenzo-rovigatti.github.io/oxDNA/oat/api.html#bond-analysis to have per-configuration output. The CLI script now also includes a plot which shows the number of bonds over time, which I find very useful for equilibrating structures where there are a lot of bonds I need to form.
[image: bonds] https://user-images.githubusercontent.com/36451772/208703953-c1d697a9-290b-4e2e-9b9d-e73578318c95.png
— Reply to this email directly, view it on GitHub https://github.com/lorenzo-rovigatti/oxDNA/issues/45#issuecomment-1359567593, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARS5VAZMINFHXAFEIEDCSTTWOHGIZANCNFSM6AAAAAARB3RWGI . You are receiving this because you commented.Message ID: @.***>
The
bond_analysis.py
script only outputs the correct bond occupancy in the JSON file. Will it be possible to save total/incorrect bond occupancy in the same JSON file?