marrink-lab / polyply_1.0

Generate input parameters and coordinates for atomistic and coarse-grained simulations of polymers, ssDNA, and carbohydrates
Apache License 2.0
122 stars 21 forks source link

Construction of large DNA segments #266

Closed LHRK closed 2 years ago

LHRK commented 2 years ago

Hi! I am trying to construct a poly A 4000 nt DNA segment, having a spherical constraint on as well. It works for 400 nt, but I get problems with 1000 nt and up.

The main issues are box size and memory error. First I get, that the box is not large enough, hence I naively increase the box, but then I get memory error.

When trying to construct the 1000 or 4000 nt with a spherical constraint using 10 nm in radius, I get the error: "OSError: Sampling the end-to-end distance distribution yielded end-to-end distances that are larger than the box diagonal. This will not work. Please increase the box to be at least 515.8467358001363 nm."

Now if I simply increase the box size, I get memory error "numpy.core._exceptions.MemoryError: Unable to allocate 349. GiB for an array with shape (3, 2500, 2500, 2500) and data type float64". This is also the case with just 1000 nt. For 400 nt, it works fine.

Okay, fair, but I find it confusing, since I am applying the sphe0rical constraint, which is smaller than the box size. Of course it can be, that my spherical restraint is too small as well with only 10 nm radius, but for now, this I can not even test. since the errors are the same not matter the spherical constraint. Increasing the radius of the spherical constraint, do not help either. PolyA.zip

I have uploaded the files for the 4000 nt DNA string I am trying to built, along with the commands used.

Help would be appreciated! Maybe I am simply misunderstanding the building file for polyply.

Best Lisbeth

fgrunewald commented 2 years ago

Hi @LHRK,

So a couple of things happen together here. I'll try my best to disentangle them:

1) first the original error - about the box being to small - is being raised due to the way polyply implements persistence lengths. To get the persistence length correct first an end-to-end distance distribution is sub-sampled. In your case you draw 1 sample from that distribution. Because the 4000 residue chain is very long the end-to-end distance can be very large as well. Here you have two options: Either you subsample intervals on the chain (e.g. resid 0 400, resid 401 801 ...) or you ignore it. For long chains like this the end-to-end distance distribution is typically very wide, which means it is a less ideal descriptor and also see point 3. This means that when you have spherical constraints your sphere cannot be smaller than this end-to-end distance, which in your case likely will never be the case.

2) You try to pack your chain inside a sphere of 10 nm radius located at 0,0,0. Polply only works in positive coordinate space, which means you actually try to pack the DNA inside a quater of a sphere of 10nm radius. Simply move the origin of the sphere to r + 1 nm that is DA 1 3998 in 11 11 11 10 The 1 nm is strictly speaking not needed, but it is better to give some tolerance. Note that your box length should be at least 2xr+2 in each dimension.

3) Placing 4000 residues in a 10nm sphere is quite a high concentration; we have done some tests recently and it typically means the DNA will suck up ions thereby increasing the ion concentration inside the compartment. This is important because the persistence length will therefore be much lower than in solution where the chain is "free". I would try running without persistence length setting for now and see if the packing converges.

In addition I have two more tips/tricks:

1) reading the topology file will be slow. But for construction you can delete all dihedrals with type 9 from the itp file. This saves you some time. For me it takes in the ball-park of 15min. 2) you may have to "ligate ions" to place them close to the backbone in such a condense system. Otherwise I'm not sure it will be stable. 3) you will have to think about a good estimate about the ion-concentration inside the compartment you try to simulate, because even at the Martini level reaching equilibrium in these concentrations can several microseconds.

I hope this helps! Otherwise feel free to post here again or ask if something is unclear.

fgrunewald commented 2 years ago

@LHRK It seems this is not a bug but more related to some connection of unfortunate events. I will transfer this to the discussions board. This way it stays visible for other people, who might have the same question. If your question isn't resolved by above suggestions, you can let me know here as well.