lorenzo-rovigatti / oxDNA

A new version of the code to simulate the oxDNA/oxRNA models, now equipped with Python bindings
https://dna.physics.ox.ac.uk/
GNU General Public License v3.0
38 stars 26 forks source link

UnicodeDecodeError when running FFS #60

Closed naliio closed 1 year ago

naliio commented 1 year ago

Hi, I'm using oxDNA to perform forward flux sampling and I'm not sure whether the script ffs_flux.py has a bug. When I try to run the example in directory _oxDNA/examples/FFSexample/FFS/FLUX/, I encountered errors and copyed some of them: Process Process-2: Traceback (most recent call last): File "/home/nli/anaconda3/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/nli/anaconda3/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/scratch/job.2414/./ffs_flux.py", line 213, in f for line in output.readlines(): File "/home/nli/anaconda3/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 3247: invalid start byte

I attached the whole directory and errors are in job.2414.out file. Oddly, the simulatoin completed without error in ffs.log. I wonder if the error about ffs_flux.py has an impact on the result of initial flux. Thank you very much. FLUX.zip

Best regards, lina

lorenzo-rovigatti commented 1 year ago

It looks like an encoding issue. What OS are you using? What's the OS language?

naliio commented 1 year ago

It looks like an encoding issue. What OS are you using? What's the OS language?

Linux version 4.18.0-425.10.1.el8_7.x86_64 (mockbuild@dal1-prod-builder001.bld.equ.rockylinux.org) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-15)

lorenzo-rovigatti commented 1 year ago

Unfortunately I cannot reproduce the bug.

Can you try substituting output = tf.TemporaryFile ('r+') with output = tf.TemporaryFile ('r+', encoding='latin1') in line 173? And if that doesn't work, can you try using output = tf.TemporaryFile ('r+', encoding='utf8', error_bad_lines=False) and see whether the script works as intended?

naliio commented 1 year ago

Hi, Sorry for my late reply. I tried your first method and it worked! There was no error.

And I have some questions regarding using oxDNA to perform FFS. I would be grateful if you could spare some time to answer them:

  1. Running flux and shooting will output initial flux and successes probabilities, respectively. Is the multiplication of these two numbers the rate or the rate constant? Because some literature writes about rate, while others write about rate constants.

  2. In the FFS_example directory, the output of initial flux is 1.32592e-06. If last answer is rate constant, multiplying this value by probability will result in a very small rate constant, no more than 1. However, DNA hybridization rate constant is usually larger than 1e4 M-1s-1. I'm very confused about this contradiction.

  3. If I run on CUDA, do I just need to modify ffs flux.py and ffs shoot.py? I tried to directly use oxDNA engine run flux generation but it didn't output initial flux data, though there was no error reported.

  4. In the README file in FFS_CUDA directory, there is an energy cutoff set to 64. According to the description of simulation units, if converting it into kcal, it will be (64x4.142x10-20/4184) kcal. Is it right? And why does the energy unit convertion need to be related to temperature?

  5. Could you explain why four condition files are needed? In the script ffs_flux.py, why is it set to execute condition file apart-bw.txt first?

I'm sorry for so many questions, but they are important for me. Any insight you can offer would be greatly appreciated. Thank you very much for your time and help.

sulcgroup commented 1 year ago

Regarding your questions: 1) We interpret the flux * probability as rate of association for the example of hybridization 2) These numbers are in simulation units (1/time). There is not an easy way to relate this to actual measured value in experiment; What we do instead is that we calcualate rates for different processes (like two different sequences of toehold) and than compare their relative ratio. This number can be meaningfully compared to ratio of two rates measured in experiment 3) You should be able to use the FFS as illustrated in the FFS for CUDA example directory. There might be some issue with your cluster setup, as the script distributes the processes using socket message passing, so it requires all GPUs to be on the same node 4) This order parameter refers to the total number of hydrogen bonds formed in a specified list of values, not to an energy of a single bond 5) To calculate the initial flux, we consider only trajectories that pass through interface lambda-2 before returning to lambda-1. If the system logged everything crossing lambda-1, some system which just oscillates around lambda-1 would just keep getting logged all the time.

naliio commented 1 year ago

Thank you for your so detailed response.