seekrcentral / seekr2

Simulation-Enabled Estimation of Kinetic Rates - Version 2
MIT License
27 stars 6 forks source link

BD or MD first with run.py any #21

Closed vaibhavadixit closed 2 years ago

vaibhavadixit commented 2 years ago

Hi, We have started running the trypsin-benz tutorial and I've a question regarding the run.py step. When I ran this on A4000 GPU on a workstation I'm getting the following output, but when I ran on the HPC it is running the nam_simulation (i.e. Browndyne2) first. Thus I'm curious to know if it is possible to run the separate (BD and MD) calculations simultaneously. Please comment on this for my better understanding. Thank you and best regards. Vaibhav

A4000 output (base) [xxxx@yyyy seekr2]$ python3.8 run.py any /home/niperg/tryp_ben_hidr_tutorial/model.xml Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead. anchor 0 has not run the minimum number of steps 0 of 125000000 in swarm index None anchor 1 has not run the minimum number of steps 0 of 125000000 in swarm index None anchor 2 has not run the minimum number of steps 0 of 125000000 in swarm index None anchor 3 has not run the minimum number of steps 0 of 125000000 in swarm index None anchor 4 has not run the minimum number of steps 0 of 125000000 in swarm index None anchor 5 has not run the minimum number of steps 0 of 125000000 in swarm index None anchor 6 has not run the minimum number of steps 0 of 125000000 in swarm index None anchor 7 has not run the minimum number of steps 0 of 125000000 in swarm index None anchor 8 has not run the minimum number of steps 0 of 125000000 in swarm index None anchor 9 has not run the minimum number of steps 0 of 125000000 in swarm index None anchor 10 has not run the minimum number of steps 0 of 125000000 in swarm index None running anchor_index: 0 restart: False total_simulation_length: 125000000 num_transitions: 0 swarm index: None

"Step","Potential Energy (kJ/mole)","Temperature (K)","Box Volume (nm^3)"

1250000,-250376.41525536962,296.89521240530155,177.84127115140652 2500000,-251816.960299053,300.7401145559686,177.84127115140652 3750000,-249775.08480032883,297.9072758462745,177.84127115140652 5000000,-251141.35327599244,297.6162308596486,177.84127115140652 6250000,-250886.5529175289,294.8508385438355,177.84127115140652 7500000,-251429.59675331693,297.38012520319904,177.84127115140652 8750000,-250864.39621214918,296.1578648834993,177.84127115140652 10000000,-251011.7579468072,295.76070438850223,177.84127115140652 11250000,-250754.05444069766,297.1850567116118,177.84127115140652 12500000,-250880.8638553978,295.8717125853891,177.84127115140652 13750000,-251361.46465320792,302.357790197908,177.84127115140652 15000000,-251751.39429811202,299.9408218258437,177.84127115140652 16250000,-249909.23492376693,297.9641657593563,177.84127115140652 17500000,-250831.85557753057,297.31015841402126,177.84127115140652 18750000,-250872.648794075,299.0403939308773,177.84127115140652 20000000,-251404.7889921791,297.7748683698601,177.84127115140652 21250000,-250549.96409982583,301.07466638268477,177.84127115140652 22500000,-250308.76190222194,299.49519508510076,177.84127115140652 23750000,-249980.19527490553,297.75345978110516,177.84127115140652 25000000,-250624.00605515996,302.3343559508517,177.84127115140652 26250000,-250507.18145670742,301.4777494536038,177.84127115140652 27500000,-250228.3289753215,297.5157977352142,177.84127115140652 28750000,-250618.3579239829,299.9698301611076,177.84127115140652 30000000,-251083.14304677257,298.0573303543426,177.84127115140652 31250000,-251045.4935685862,296.4738785135786,177.84127115140652 32500000,-250732.25450399006,300.618752740865,177.84127115140652 33750000,-250150.650192847,296.68024535972495,177.84127115140652 35000000,-251843.7443778594,300.2775982225485,177.84127115140652 36250000,-250836.95118905604,298.35247101304157,177.84127115140652 37500000,-250730.25663781445,297.32925657065,177.84127115140652 38750000,-250681.76350122364,297.8295673493248,177.84127115140652 40000000,-250602.82712599728,301.8936784618029,177.84127115140652 41250000,-250908.67839316884,291.74930158738516,177.84127115140652 42500000,-250592.08901361236,296.4671783286221,177.84127115140652 43750000,-250205.9630815657,299.62388927715705,177.84127115140652 45000000,-251191.23436955502,299.895703373847,177.84127115140652 46250000,-251312.6290877089,295.9285451245737,177.84127115140652 47500000,-250885.78268891945,296.78113468776826,177.84127115140652 48750000,-251039.89442682546,299.17780628575764,177.84127115140652 50000000,-250586.59940173826,293.56036419906906,177.84127115140652 51250000,-251341.73434409965,297.40858264131396,177.84127115140652 52500000,-250785.1078860676,296.41169807661623,177.84127115140652 53750000,-250548.55212000664,299.82566753647023,177.84127115140652 55000000,-251336.17954663094,297.700648296592,177.84127115140652 56250000,-251493.64182420238,297.4044061539442,177.84127115140652 57500000,-250436.24132813164,297.64335069479813,177.84127115140652 58750000,-250900.1289494743,297.87782575301594,177.84127115140652 60000000,-251365.70957936766,299.9146946657265,177.84127115140652 61250000,-251739.93011926254,298.8694927949889,177.84127115140652 62500000,-251018.5306754359,300.8728575753887,177.84127115140652 63750000,-250989.5727051692,301.2254064441114,177.84127115140652 65000000,-251151.17498785816,300.28418159681985,177.84127115140652 66250000,-250811.2604660911,297.125632003862,177.84127115140652 67500000,-250745.48249687115,301.95650385173224,177.84127115140652 68750000,-250896.07283422747,301.1085610457348,177.84127115140652 70000000,-250620.20485602436,299.3786922707276,177.84127115140652 71250000,-249668.51900003082,299.38797967742096,177.84127115140652 72500000,-250026.44720053556,296.880736328326,177.84127115140652 73750000,-251382.91487488337,293.0451477013571,177.84127115140652 75000000,-250798.74752313131,301.3031975570151,177.84127115140652 76250000,-251841.08711535623,300.24075164395157,177.84127115140652 77500000,-250607.52170097549,299.82053728140306,177.84127115140652 78750000,-251339.97406868823,300.0053381759625,177.84127115140652 80000000,-251620.8250389523,298.3618845607552,177.84127115140652 81250000,-251612.2944833946,298.2132969679813,177.84127115140652 82500000,-250558.90810871706,299.03952799174374,177.84127115140652 83750000,-250634.8499758104,300.8930652191651,177.84127115140652 85000000,-251014.51466802903,294.7870487439599,177.84127115140652 86250000,-250786.990156529,295.9250767611833,177.84127115140652 87500000,-251009.94473820366,299.3163430049137,177.84127115140652 88750000,-250342.0269576956,302.3439476150922,177.84127115140652 90000000,-250707.24345579604,294.5966184916818,177.84127115140652 91250000,-250763.4695239584,297.27304561892134,177.84127115140652 92500000,-251729.84289589804,298.48210669852375,177.84127115140652 93750000,-250918.35286462284,296.7007638691411,177.84127115140652 95000000,-251102.30022457894,296.47653030568483,177.84127115140652 96250000,-251151.94991578627,294.0330664425347,177.84127115140652 97500000,-249825.05061047943,300.7539475266632,177.84127115140652 98750000,-251913.89867717726,295.9855820395087,177.84127115140652 100000000,-250226.26716483058,297.799287179786,177.84127115140652 101250000,-249982.53791840328,298.08416196314124,177.84127115140652 102500000,-251413.05335161253,301.5823982733784,177.84127115140652 103750000,-250439.35385266365,297.79579694683963,177.84127115140652 105000000,-250791.20807962585,299.55582695186314,177.84127115140652 106250000,-250653.68977421243,297.91695645628715,177.84127115140652 107500000,-250195.73558695614,295.0807334628034,177.84127115140652 108750000,-251093.61188615602,296.53038741796996,177.84127115140652 110000000,-251480.1944490187,304.1390419674086,177.84127115140652 111250000,-250769.3652417278,294.8156741931179,177.84127115140652 112500000,-250157.7076921661,296.331567877692,177.84127115140652 113750000,-250983.5466161277,300.3119088244567,177.84127115140652 115000000,-250947.34169258294,299.686376747694,177.84127115140652 116250000,-250522.54435569188,301.2849067322041,177.84127115140652 117500000,-250931.10059117898,294.7009081382181,177.84127115140652 118750000,-250664.06582189025,303.4956653291268,177.84127115140652 120000000,-251064.59574373765,297.32083267079287,177.84127115140652 121250000,-251606.44924361142,296.75987865226944,177.84127115140652 122500000,-250993.6611946551,298.8907556879082,177.84127115140652 123750000,-250197.03636861872,295.25539555555633,177.84127115140652 125000000,-249612.35455580102,295.8961460193411,177.84127115140652 Benchmark (ns/day): 282.6259063385334 running anchor_index: 1 restart: False total_simulation_length: 125000000 num_transitions: 0 swarm index: None

"Step","Potential Energy (kJ/mole)","Temperature (K)","Box Volume (nm^3)"

1250000,-251259.4426750599,294.6629526592754,177.84127115140652 2500000,-250765.72184476117,299.36995856663737,177.84127115140652 3750000,-250848.54080326902,296.31244688110917,177.84127115140652 5000000,-250384.19994499488,293.4496098255166,177.84127115140652 6250000,-250924.3924683323,300.2378917470994,177.84127115140652 7500000,-250770.1583597092,298.09511762078046,177.84127115140652 8750000,-251455.48746547895,297.8792693442864,177.84127115140652 10000000,-250859.59599494375,295.0429203120194,177.84127115140652 11250000,-250136.3151631034,298.3075320749688,177.84127115140652 12500000,-251009.68614812754,294.2744449183289,177.84127115140652 13750000,-250640.44650838315,296.8044192410573,177.84127115140652 15000000,-250418.59393999376,293.78215103701274,177.84127115140652 16250000,-251351.28738953965,297.4008396898686,177.84127115140652 17500000,-251294.72341999668,299.531905809816,177.84127115140652 18750000,-250463.3308101648,303.73813319121285,177.84127115140652 20000000,-250427.12095736945,297.52910853449305,177.84127115140652 21250000,-250973.74070397043,300.3789704509949,177.84127115140652 22500000,-250579.21751565533,298.7562571851669,177.84127115140652 23750000,-251432.9784859363,301.49303895725274,177.84127115140652 25000000,-251110.66937395255,301.46464267108837,177.84127115140652 26250000,-251251.63167706504,299.49951037430156,177.84127115140652 27500000,-250572.44890967244,296.89380434480745,177.84127115140652 28750000,-250140.19067218387,301.1059178934554,177.84127115140652 30000000,-251199.66560487263,294.17380927028927,177.84127115140652 31250000,-250043.40207134187,300.95852182176037,177.84127115140652 32500000,-250791.76140220044,296.47326886836913,177.84127115140652 33750000,-250442.05083427997,301.998836962379,177.84127115140652 35000000,-250590.44339945703,300.5902666019184,177.84127115140652 36250000,-251071.2009507832,300.9992785430724,177.84127115140652 37500000,-251506.3604732894,297.49571766768435,177.84127115140652 38750000,-250895.8533184596,301.5968664245832,177.84127115140652 40000000,-250883.29082193132,298.26300253227936,177.84127115140652 41250000,-251651.47171014082,307.73439340465194,177.84127115140652 42500000,-250305.12439275743,298.0143337338997,177.84127115140652 43750000,-250613.1140536156,297.4515498466713,177.84127115140652 45000000,-250868.51864006324,298.561895048489,177.84127115140652 46250000,-250564.8245985997,297.99240701222277,177.84127115140652 47500000,-251012.72356958687,300.39938744525585,177.84127115140652 48750000,-250392.43096753443,298.58242407492764,177.84127115140652 50000000,-251035.1473082339,296.14095207941426,177.84127115140652 51250000,-251227.71490027197,302.4036548209542,177.84127115140652 52500000,-250522.53721845592,296.81303077479197,177.84127115140652 53750000,-250826.29571607313,298.405325878737,177.84127115140652 55000000,-250120.66219415353,298.0081225182074,177.84127115140652 56250000,-251105.63211397594,297.82980884685645,177.84127115140652 57500000,-250323.36232781573,297.5810193031567,177.84127115140652 58750000,-250623.39771179738,297.0397373693085,177.84127115140652 60000000,-251313.26762351906,292.3351538455054,177.84127115140652 61250000,-250154.16336838342,297.08175895943714,177.84127115140652 62500000,-252308.72308942117,299.07260160561094,177.84127115140652 63750000,-250594.4323346382,301.9754779054601,177.84127115140652 65000000,-251859.11473322287,299.6608428151863,177.84127115140652 66250000,-251087.2070069802,299.6953884748485,177.84127115140652 67500000,-251802.10227993852,297.0937701070965,177.84127115140652 68750000,-251570.61608901946,299.7394249148812,177.84127115140652 70000000,-251306.4699697583,300.08195487568383,177.84127115140652 71250000,-251280.58563445345,298.52583806825453,177.84127115140652 72500000,-250737.1717557311,301.9037488948244,177.84127115140652 73750000,-251257.2719639868,297.70835995995475,177.84127115140652 75000000,-250454.95606256882,300.76403074623505,177.84127115140652 76250000,-251292.47676609014,303.15305000858115,177.84127115140652 77500000,-251573.22319973866,296.6073573182158,177.84127115140652 78750000,-250386.25605534436,302.42128964907766,177.84127115140652 80000000,-250751.07752651232,302.4641231108624,177.84127115140652 81250000,-251387.76963990787,293.865031217276,177.84127115140652 82500000,-250100.21676361188,300.00358798050877,177.84127115140652 83750000,-250601.57886954676,301.3720101791027,177.84127115140652 85000000,-250235.0574428863,295.74220956424205,177.84127115140652 86250000,-251469.80277535156,300.77650223503815,177.84127115140652 87500000,-249988.65012317477,296.276116870113,177.84127115140652 88750000,-250502.05432189978,296.1010129234983,177.84127115140652 90000000,-251968.39292710647,296.60859281762686,177.84127115140652 91250000,-250791.86354266037,295.8910847277333,177.84127115140652 92500000,-250404.56387520768,298.9048408840197,177.84127115140652 93750000,-251154.475504898,296.99560105670747,177.84127115140652 95000000,-251359.19246092672,299.51372660792583,177.84127115140652 96250000,-250764.11982902559,298.4252684757289,177.84127115140652 97500000,-251484.33265610877,299.40898474185764,177.84127115140652 98750000,-250935.99084728095,300.932682078628,177.84127115140652 100000000,-251445.2858629264,301.49998914194157,177.84127115140652 101250000,-250187.6698401617,297.4103964713417,177.84127115140652 102500000,-251637.72404296696,297.3061071492267,177.84127115140652 103750000,-250904.40802089963,300.4314819328267,177.84127115140652 105000000,-251078.05117556825,296.9107071906265,177.84127115140652 106250000,-250952.78594076866,297.01861025279345,177.84127115140652 107500000,-250802.97160940943,296.9837422883592,177.84127115140652 108750000,-250144.81036504335,295.5391186771401,177.84127115140652 110000000,-250408.76959471265,299.5920144289264,177.84127115140652 111250000,-251265.48837245023,298.6026394090152,177.84127115140652 112500000,-251187.4663939576,299.35775457542996,177.84127115140652 113750000,-251361.33346678014,295.0478388167673,177.84127115140652 115000000,-251394.21021181066,301.3376359074994,177.84127115140652 116250000,-251057.08305738913,296.7607983850883,177.84127115140652 117500000,-250673.4803463628,295.2263131829163,177.84127115140652 118750000,-250833.26661640173,299.1149224913304,177.84127115140652 120000000,-251410.9140135313,300.213790546652,177.84127115140652 121250000,-250332.92606316763,298.6392307175874,177.84127115140652 122500000,-251197.99922959972,298.6720200794495,177.84127115140652 123750000,-251727.90174243366,295.2262481836339,177.84127115140652 125000000,-250838.0265258979,298.9726168252748,177.84127115140652 Benchmark (ns/day): 281.221144219288 running anchor_index: 10 restart: False total_simulation_length: 125000000 num_transitions: 0 swarm index: None

"Step","Potential Energy (kJ/mole)","Temperature (K)","Box Volume (nm^3)"

1250000,-250963.61889972934,297.8796832262577,177.84127115140652 2500000,-251100.75729331374,297.7754051036002,177.84127115140652 3750000,-250787.89210539637,297.1841722775951,177.84127115140652 5000000,-250622.71299585304,297.06264123069576,177.84127115140652 6250000,-250821.9841966941,295.32203814155105,177.84127115140652 7500000,-249772.9566296395,297.2846601329349,177.84127115140652 8750000,-251087.09847757034,294.96687104952366,177.84127115140652 10000000,-250296.20332069322,297.50360697497297,177.84127115140652 11250000,-251053.39524247777,297.7810223006985,177.84127115140652 12500000,-250609.3133067037,301.9160241174032,177.84127115140652 13750000,-250654.74175767647,299.9137915559968,177.84127115140652 15000000,-251246.0571761001,297.5900131046811,177.84127115140652 16250000,-249805.84242308652,296.1516417954397,177.84127115140652 17500000,-250385.3948628693,293.95899986085556,177.84127115140652 18750000,-250870.40786798252,300.92852254107123,177.84127115140652 20000000,-250905.67367758462,296.4416548335448,177.84127115140652 21250000,-250707.01590270572,298.4641538897293,177.84127115140652 22500000,-251103.76571191382,298.0251731412272,177.84127115140652 23750000,-250857.61640539858,298.2536600109732,177.84127115140652 25000000,-250574.95729532233,297.43102224900605,177.84127115140652 26250000,-251082.92657866422,298.13251092741314,177.84127115140652 27500000,-251642.8137897672,297.66562113432065,177.84127115140652 28750000,-250673.8559527225,300.2280735890014,177.84127115140652 30000000,-250993.53902922152,299.61717458412096,177.84127115140652 31250000,-251593.71903408505,303.1971113478435,177.84127115140652 32500000,-250612.1940932686,296.17957592455355,177.84127115140652 33750000,-251109.1292685545,297.87972097528217,177.84127115140652 35000000,-251504.09697998967,301.89775846431866,177.84127115140652 36250000,-250624.9054595204,299.6020381533652,177.84127115140652 37500000,-250196.6593948789,297.58610478306593,177.84127115140652 38750000,-250772.68634288735,295.9347013668348,177.84127115140652 40000000,-251121.43306054547,300.965484540292,177.84127115140652 41250000,-251319.9035094534,298.1519555442378,177.84127115140652 42500000,-252062.46652876912,299.81752553831586,177.84127115140652 43750000,-250913.95741329156,297.09971023855746,177.84127115140652 45000000,-251023.71011325298,297.6121898078329,177.84127115140652 46250000,-251156.1582629932,301.21430667652606,177.84127115140652 47500000,-250973.7799685821,301.2418667839347,177.84127115140652 48750000,-250061.98426671792,295.67015589823285,177.84127115140652 50000000,-250858.78900723904,300.39840964216665,177.84127115140652 51250000,-249833.1550544314,300.04648956517707,177.84127115140652 52500000,-250755.61106327875,297.6282421147988,177.84127115140652 53750000,-250633.77985838032,296.68933906938247,177.84127115140652 55000000,-249208.43697069818,297.18976749075216,177.84127115140652 56250000,-251849.91741440422,300.7844392343469,177.84127115140652 57500000,-250662.8009897722,297.99908109997284,177.84127115140652 58750000,-250871.8564283948,295.70531975683423,177.84127115140652 60000000,-251025.46409789985,297.01312224961515,177.84127115140652 61250000,-250437.04213957582,297.8969129060293,177.84127115140652 62500000,-250929.3088880726,301.3655435105789,177.84127115140652 63750000,-251365.3793179521,300.7234767423653,177.84127115140652 65000000,-250499.15234704735,300.0873309512295,177.84127115140652 66250000,-250928.9496843838,295.9500399191102,177.84127115140652 67500000,-251202.30126014398,298.7874974974075,177.84127115140652 68750000,-251216.94150573667,294.6608167870472,177.84127115140652 70000000,-250390.27349948348,297.23256132492077,177.84127115140652 71250000,-251404.40668005217,297.1052465295271,177.84127115140652 72500000,-250735.0252188514,295.83046509164376,177.84127115140652 73750000,-250352.66381779732,296.40474659807165,177.84127115140652 75000000,-251743.92527426593,298.4225979534454,177.84127115140652 76250000,-251675.50774102146,296.45510815031645,177.84127115140652 77500000,-251600.76772023272,296.54512463258675,177.84127115140652 78750000,-251342.93648028607,293.5898786595307,177.84127115140652 80000000,-251557.576574611,298.3036344967419,177.84127115140652 81250000,-250827.78162114602,298.6237527864562,177.84127115140652 82500000,-251362.96318814694,294.70286897441025,177.84127115140652 83750000,-250847.04300829396,300.2560358190567,177.84127115140652 85000000,-250414.343118208,299.48662507907414,177.84127115140652 86250000,-251087.53326319717,299.710353431438,177.84127115140652 87500000,-250474.6633588774,302.65096347009177,177.84127115140652 88750000,-250881.799825656,299.5173791444326,177.84127115140652 90000000,-251291.59384958865,300.02986660417866,177.84127115140652 91250000,-251566.37239163462,295.7179376227423,177.84127115140652 92500000,-250634.69581886218,295.28119503196945,177.84127115140652 93750000,-251367.401673167,298.4655400438071,177.84127115140652 95000000,-250645.225053536,299.2935865258229,177.84127115140652 96250000,-250860.34303405532,298.6747387437222,177.84127115140652

HPC output [vaibhav@node1 seekr2-tutorial]$ cat job.50.out command surface_spheres -probe_radius 1.5 < tryp_ben_receptor.xml > receptor_surface.xml command inside_points -spheres tryp_ben_receptor.xml -surface receptor_surface.xml -egrid receptor3.dx > receptor_inside.xml command hydro_params < receptor_inside.xml > receptor_hydro_params.xml command test_charges < tryp_ben_receptor.xml > receptor_charges.xml command lumped_charges -pts receptor_charges.xml > receptor_cheby.xml command make_surface_sphere_list -surface receptor_surface.xml -spheres tryp_ben_receptor.xml -rxn rxns.xml -group receptor -core receptor > receptor_surface_atoms.xml command mpole_grid_fit -dx receptor0.dx -solvdi 78 -debye 1.79769e+308 > receptor_mpole.xml command born_integral -in receptor_inside.xml -oeps 78 -debye 1.79769e+308 -ieps 4.000000 > receptor_born.dx command compute_charges_squared < receptor_charges.xml > receptor_charges_squared.xml command lumped_charges -pts receptor_charges_squared.xml > receptor_squared_cheby.xml command surface_spheres -probe_radius 1.5 < tryp_ben_ligand.xml > ligand_surface.xml command inside_points -spheres tryp_ben_ligand.xml -surface ligand_surface.xml -egrid ligand2.dx > ligand_inside.xml command hydro_params < ligand_inside.xml > ligand_hydro_params.xml command test_charges < tryp_ben_ligand.xml > ligand_charges.xml command lumped_charges -pts ligand_charges.xml > ligand_cheby.xml command make_surface_sphere_list -surface ligand_surface.xml -spheres tryp_ben_ligand.xml -rxn rxns.xml -group ligand -core ligand > ligand_surface_atoms.xml command mpole_grid_fit -dx ligand0.dx -solvdi 78 -debye 1.79769e+308 > ligand_mpole.xml command born_integral -in ligand_inside.xml -oeps 78 -debye 1.79769e+308 -ieps 4.000000 > ligand_born.dx command compute_charges_squared < ligand_charges.xml > ligand_charges_squared.xml command lumped_charges -pts ligand_charges_squared.xml > ligand_squared_cheby.xml

lvotapka commented 2 years ago

Instead of using the “any” argument for run.py, use the “any_md” argument.

vaibhavadixit commented 2 years ago

Hi, Strangely I'm getting the following message with any_md option. I just copied the input files from workstation to HPC and wanted to run the MD part of the calculation which has taken > 10 days on the workstation (A4000). Does the model.xlm or any other input file saves any record of how far the calculation has proceeded? Please suggest if I need to make some changes to the model.xlm or any other file to run all BD and MD from fresh to compare the performance of the A100 vs A4000 cards. There is only 10-15 % difference between the two card w.r.t. Amber22 pmemd.cuda jobs. Thanks

[xxxx@xxxx seekr2-tutorial]$ tail job.54.out Nothing was run because all criteria are satisfied.

lvotapka commented 2 years ago

Yeah since you copied over the files, it thinks the calculation is finished because of the checkpoint files in each of the anchors. To force rerun, use the -f argument for run.py.

vaibhavadixit commented 2 years ago

Ok, just did that and it is running the nam_simulation again. Can I expect it to run the MD part after this? Is it possible to simultaneously run BD and MD parts since they use CPU and GPU respectively and also (I think) are independent of each other? thanks for the quick response.

vaibhavadixit commented 2 years ago

Hi, The calculation seems to have terminated prematurely. Output related only to BD is printed even after using run.py -f option. The any_md option also doesn't run any MD simulation. Do I need to run the prepare.py step also on the HPC? Please suggest. thank you

[xxxx@xxxx seekr2-tutorial]$ cat job.55.out BrownDye 2.0: Version of 13 Jun 2022 running BD: b_surface restart: False trajectories to run: 1000000 trajectories so far: 1000000 number of transitions 0 moving to directory: /home/vaibhav/seekr2-tutorial/b_surface running command: bd_top input.xml moving to directory: /home/vaibhav/seekr2-tutorial/b_surface running command: nam_simulation receptor_ligand_simulation.xml

lvotapka commented 2 years ago

Would you please post the exact run command you are using? I'm not sure why this is happening.

vaibhavadixit commented 2 years ago

These are the exact commands I'm using in slurm script. please suggest

source /home/software/miniconda3/bin/activate conda activate myseekr2 python /home/software/seekr2/seekr2/run.py -f any model.xml

(base) [xxx@node1 seekr2-tutorial]$ conda activate myseekr2 (myseekr2) [xxx@node1 seekr2-tutorial]$ python Python 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.

vaibhavadixit commented 2 years ago

Hi, Just a gentle reminder, if you can respond to this and the workstation difficulties I'm facing with seekr2? Thank you very much for your support. Best regards. Vaibhav

lvotapka commented 2 years ago

I'm currently on vacation, but I believe the short solution to your problem is to use "any_md" as mentioned previously in this thread:

python /home/software/seekr2/seekr2/run.py any_md model.xml -f

Alternatively, you can run the anchors by integer:

python /home/software/seekr2/seekr2/run.py 0 model.xml -f

One can even use multiple GPUs if available on a node:

python /home/software/seekr2/seekr2/run.py 0 model.xml -f -c 0 &
python /home/software/seekr2/seekr2/run.py 1 model.xml -f -c 1 &
python /home/software/seekr2/seekr2/run.py 2 model.xml -f -c 2 &
python /home/software/seekr2/seekr2/run.py 3 model.xml -f -c 3 &
wait

There should be no need to rerun prepare.py.

All programs display useful instructions for how to run them with the '-h' argument. I also suggest carefully reviewing all documentation, especially if I am not accessible.

vaibhavadixit commented 2 years ago

Hi, I tried the first and second options you suggested (since I've only one GPU), but in vain. The MD part of the calculation won't run and it prints a message saying nothing to do.

Nothing was run because all criteria are satisfied.

Thus I tried to do the tutorial from scratch on the HPC as given here.

In this new trial, I'm getting the following error in the hidr step which looks related to the hidr.py script. Please have a look and let me know, how can I possibly fix the same? Thank you

Error is shown below Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead. Traceback (most recent call last): File "/home/software/seekrtools/seekrtools/hidr/hidr.py", line 366, in hidr(model, destination, pdb_files, dry_run, equilibration_steps, File "/home/software/seekrtools/seekrtools/hidr/hidr.py", line 190, in hidr hidr_simulation.run_SMD_simulation( File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.64.g33c013b-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 479, in run_SMD_simulation system, topology, positions, box_vectors = run_window( File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.64.g33c013b-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 386, in run_window add_forces(sim_openmm, model, anchor, restraint_force_constant, cv_list, File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.64.g33c013b-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 266, in add_forces myforce = make_restraining_force(cv, variables_values_list) File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.64.g33c013b-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 222, in make_restraining_force cv.add_groups_and_variables(myforce, variables_values_list) TypeError: add_groups_and_variables() missing 1 required positional argument: 'alias_id' ~

My batch file is shown below

!/bin/bash

SBATCH --job-name=seekr2job1 ##job name

SBATCH -N 1 ##number of nodes requires

SBATCH --nodelist=node1

SBATCH --ntasks-per-node=22 ##number of cpu requires

SBATCH --time=95:50:20 ##time optional

SBATCH --error=job.%J.err ## Job error

SBATCH --output=job.%J.out ##job out put if any

SBATCH --partition=GPU_NODES ##partition name

SBATCH --gres=gpu:1 ## number of gpu card requires

source /home/software/miniconda3/bin/activate conda activate myseekr2 python /home/software/seekrtools/seekrtools/hidr/hidr.py any model.xml -M SMD -p tryp_ben.pdb

lvotapka commented 2 years ago

I'm now back from vacation.

I've run tests of SEEKR2 to see if there was a problem with the "any_md" or integer arguments of the run.py program and everything seems to work as expected.

Without force overwrite:

$ python ~/seekr2/seekr2/run.py any_md model.xml 
Nothing was run because all criteria are satisfied.

With force overwrite:

$ python ~/seekr2/seekr2/run.py any_md model.xml -f
anchor 0 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 1 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 2 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 3 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 4 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 5 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 6 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 7 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 8 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 9 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 10 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 11 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 12 has not run the minimum number of steps 0 of 10000000 in swarm index None
anchor 13 has not run the minimum number of steps 0 of 10000000 in swarm index None

Same for integer arguments.

The only reason you should be getting the "Nothing was run because all criteria are satisfied." message is if you forgot the "-f" argument to force rerun. Also, are you sure that you're using the latest version of the SEEKR2 software?

As for the problem with hidr.py, that is a recent bug, thank you for finding it. I've just pushed the bugfix to the seekrtools repository.

vaibhavadixit commented 2 years ago

As you can see from the top command on the node below, I did include the -f argument nonetheless I'm getting the message nothing to run. Is it a bug in my installation? Please suggest. thank you.

1522846 vaibhav 20 0 6465084 266084 80684 R 100.0 0.2 0:08.62 python /home/software/seekr2/seekr2/run.py any_md model.xml -f

(base) [vaibhav@node1 seekr2-tutorial]$ more job.97.out Nothing was run because all criteria are satisfied. (base) [vaibhav@node1 seekr2-tutorial]$ more job.97.err Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead. (base) [vaibhav@node1 seekr2-tutorial]$ (base) [vaibhav@node1 seekr2-tutorial]$ more seekr2job.batch

!/bin/bash

SBATCH --job-name=seekr2job1 ##job name

SBATCH -N 1 ##number of nodes requires

SBATCH --nodelist=node1

SBATCH --ntasks-per-node=22 ##number of cpu requires

SBATCH --time=95:50:20 ##time optional

SBATCH --error=job.%J.err ## Job error

SBATCH --output=job.%J.out ##job out put if any

SBATCH --partition=GPU_NODES ##partition name

SBATCH --gres=gpu:1 ## number of gpu card requires

source /home/software/miniconda3/bin/activate conda activate myseekr2

python /home/software/seekr2/seekr2/prepare.py input_tryp_ben_hidr.xml

python /home/software/seekr2/seekr2/run.py any model.xml -f

python /home/software/seekr2/seekr2/run.py any_md model.xml -f

vaibhavadixit commented 2 years ago

OK, I'm running two sets of simulations on the HPC 1) where files are copied from the workstation and 2) started the tutorial from scratch.

The 1st gives the same "Nothing to do " error message.

For the 2nd (from scratch) simulation I got the seekrtools error which was resolved but now I'm getting NaN coordiate error with openmm. Does it indicate that input pqr or pdb file is not in the right format or something else? Please suggest. thank you

Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead. Traceback (most recent call last): File "/home/software/seekrtools/seekrtools/hidr/hidr.py", line 366, in hidr(model, destination, pdb_files, dry_run, equilibration_steps, File "/home/software/seekrtools/seekrtools/hidr/hidr.py", line 190, in hidr hidr_simulation.run_SMD_simulation( File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.65.g549ac37-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 480, in run_SMD_simulation system, topology, positions, box_vectors = run_window( File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/seekrtools-0+untagged.65.g549ac37-py3.8.egg/seekrtools/hidr/hidr_simulation.py", line 396, in run_window sim_openmm.simulation.step(total_number_of_steps) File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/openmm/app/simulation.py", line 141, in step self._simulate(endStep=self.currentStep+steps) File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/openmm/app/simulation.py", line 206, in _simulate self.integrator.step(10) # Only take 10 steps at a time, to give Python more chances to respond to a control-c. File "/home/software/miniconda3/envs/myseekr2/lib/python3.8/site-packages/openmm/openmm.py", line 13872, in step return _openmm.LangevinIntegrator_step(self, steps) openmm.OpenMMException: Particle coordinate is NaN. For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan

lvotapka commented 2 years ago

From inside the seekr2/ directory, type "git log" and paste the first 10 or so lines here.

vaibhavadixit commented 2 years ago

This is what I see with git log command. I guess you want to check if the bugfix has been applied or not, right?

commit caa21a87edb93f410b500759aef3035b46f0034b (HEAD -> master, origin/master, origin/dev, origin/HEAD) Author: Lane Votapka lvotapka100@gmail.com Date: Thu Aug 18 23:22:49 2022 -0600

extraneous print statements removed

commit fbbc8e34cf42e5f80901da35af5b3405e282be90 Author: Lane Votapka lvotapka100@gmail.com Date: Thu Aug 18 18:03:46 2022 -0600

Implemented Voronoi CV and anchors. Also corrected some bugs with RMSD CV and added missing check functions.

commit 277bc19176ebab182403117ec640eb5fb258e76f Author: Lane Votapka lvotapka100@gmail.com Date: Wed Aug 10 11:20:00 2022 -0600

working on developing Voronoi Tesselation CVs and anchors.

commit b395a164a08724d8378c90a1fdcd685d6fc4c7ce Author: Lane Votapka lvotapka100@gmail.com Date: Fri Aug 5 09:27:38 2022 -0600

Fixed bug affecting short Elber trajectories

commit 7b895a23b1930667aed16f42fd6193f52cc1f624 Author: Lane Votapka lvotapka100@gmail.com Date: Thu Aug 4 15:39:48 2022 -0600

increasing run.py CONVERGENCE_INTERVAL to a larger number

commit 139587878ba7ac4247c253f4ff8b573a7576ab64 Author: Lane Votapka lvotapka100@gmail.com Date: Tue Aug 2 16:39:45 2022 -0600

Fixed anchor connections of bulk states in Grid combos.

commit f54e48e01ad288b531349df951fb4569655e66df Author: Lane Votapka lvotapka100@gmail.com Date: Tue Aug 2 14:22:02 2022 -0600

Updated tests for new Toy system state_point.

commit c521d263c752df9a37ceb271fdbaa7e3baf06a48 Author: Lane Votapka lvotapka100@gmail.com Date: Tue Aug 2 11:37:47 2022 -0600

Fixed state_point definitions for toy systems with more than one particle

commit 414d3ac8f4e493497d865a28f00eb8fe610c36c1 Author: Lane Votapka lvotapka100@gmail.com Date: Fri Jul 29 16:55:54 2022 -0600

Added warning to model.xml not to modify by hand.

commit 60605f47bbfd6d648377f0e51b3c389783785614 Author: Lane Votapka lvotapka100@gmail.com Date: Fri Jul 29 14:17:28 2022 -0600

lvotapka commented 2 years ago

Yes, I was trying to see if you have the latest version, which you seem to.

Alright, let's try this manually...

From the directory where model.xml and the anchor_* folders are located, type the following command:

rm anchor_*/prod/*

From there, you should be able to run without the "Nothing was run" message.

Also, would you be willing to attach your model.xml file to this thread for me to look at?

vaibhavadixit commented 2 years ago

For the file copied from the workstation. Oops, I accidentally removed all the anchor folders since the command you suggested didn't work. Then I had to also remove the b_surface folder for the prepare.py step to work, which finished quickly and correctly. Then the run.py any model.xml step is running now. Looks like it is running the BD part of the calculation.

The old model.xml1.txt and new model.xml.txt files are attached herewith for your reference if that helps. Just wondering if the BD and MD simulations are independent, why can't we run both simultaneously? thank you

model.xml.txt model.xml1.txt

vaibhavadixit commented 2 years ago

This calculation again stopped after nam_simulation step. Now I've submitted with run.py any model.xml -f option to check if that runs the MD portion of the calculation. I'll paste the update soon. thank you

lvotapka commented 2 years ago

Nothing seems wrong with your model.xml files. If you delete the files in each anchor's prod/ directory, there is no way that SEEKR could think that the MD portion is finished.

Do you want to just leave out the BD entirely? If so, then you can remove the entire block in the input XML and replace it with a line containing: "", which will set that variable to None and no BD will be run.

Are you able to send me your system? If you send me your input XML and all input files in a way that I can easily run prepare.py, then I can try out your system to see if anything is wrong.

vaibhavadixit commented 2 years ago

Again only the BD portion ran and it did not run the MD part of the calculation. I'm some what puzzled. As suggested I've shared a link to all the input files here. Please do check at your end and help me understand where I'm making a mistake.

tail job.111.out BrownDye 2.0: Version of 13 Jun 2022 running BD: b_surface restart: False trajectories to run: 1000000 trajectories so far: 1000000 number of transitions 0 moving to directory: /home/vaibhav/seekr2-tutorial/b_surface running command: bd_top input.xml moving to directory: /home/vaibhav/seekr2-tutorial/b_surface running command: nam_simulation receptor_ligand_simulation.xml

Thank you

lvotapka commented 2 years ago

Once I can get to it, I'll take a look at your files and see if I can reproduce the problem.

lvotapka commented 2 years ago

Aha, I see your problem now. You do not have any tags filled. This is fine if you want to use HIDR with the "-p" argument to assign a starting PDB, but without any PDB files in any of the anchors, run.py doesn't do anything. So you need to use HIDR to assign the starting PDBs in the model and then you can use run.py

vaibhavadixit commented 2 years ago

Hi, I ran the HIDR calculation and then it is correctly proceeded to MD part of the calculation. Now the question is how much speed up I can expect on this A100 card compared to the A4000 (10 days) that I have on the workstation. I guess not much or is it? I've attached the output in case it helps you guess the time it is likely to take for this calculation on A100. Thanks again for your valuable suggestions and insights. best regards, Vaibhav job.117.out.txt

lvotapka commented 2 years ago

All SEEKR calculations will print a benchmark once finished, so if you run short jobs using, say the "-t 10000" argument, then you can see how fast the calculation is running on each card.

lvotapka commented 2 years ago

Sounds like this issue is resolved, I'll go ahead and close it.