Closed TAlonglong closed 5 years ago
@TAlonglong thanks for the bug report. @alexmaul have you ever seen this ?
@TAlonglong @mraspaud I had this error once, but could not reproduce it back then. I'll look into it and try to find the reason ... might be next week though.
@mraspaud and @alexmaul Just a few more details.
Just before the crash the log says:
[INFO: 2017-04-19 21:53:08 : trollsched] Generating coordinated schedules ... [DEBUG: 2017-04-19 21:53:08 : trollsched] station: oslo-x, order: 97 [DEBUG: 2017-04-19 21:53:08 : trollsched] station: oslo-l, order: 43 [DEBUG: 2017-04-19 21:53:08 : trollsched] newgraph order: 282
This is calculated in add_graphs in combine.py
# Rough estimate for the size of the combined passes' graph.
n_vertices = 1
for g in grl:
n_vertices += g.order
n_vertices *= len(statlst)
newgraph = Graph(n_vertices=n_vertices)
logger.debug("newgraph order: %d", newgraph.order)
I don't follow the numbers here, but obviously(?) the estimate is to low?
Trygve Aspenes
@TAlonglong Quick suggestion before I leave for a meeting ...
Could you change the graph dimension into
n_vertices *= len(statlst) * 2
doubling the size?
"By the book" one would multiply all single-pass-graphs' dimensions, but with three ore more stations the dimension of the combined-graph would become astronomical, so I saved a lot of unused memory by "estimating" ... a bit too tight, I'd say ... although in all my tests that estimate was sufficient.
@alexmaul @mraspaud I did a rerun with simliar conditions as yesterday crontab run and it failed as expected.
I implemented the fix suggested and this time it worked fine. Its worth noting the original newfrap order was 282, now doubled to 564. But the needed space was 308.
So lets hope this will turn out sufficient.
Output from run with new fix: [INFO: 2017-04-20 13:51:48 : trollsched] Generating coordinated schedules ... [DEBUG: 2017-04-20 13:51:48 : trollsched] station: oslo-x, order: 97 [DEBUG: 2017-04-20 13:51:48 : trollsched] station: oslo-l, order: 43 [DEBUG: 2017-04-20 13:51:48 : trollsched] newgraph order: 564 [DEBUG: 2017-04-20 13:51:55 : trollsched] newpasses length: 308 [DEBUG: 2017-04-20 13:51:55 : trollsched] Distance: -8 [DEBUG: 2017-04-20 13:51:55 : trollsched] Path through newpasses: [308, 306, 303, 300, 295, 292, 290, 288, 282, 274, 271, 265, 250, 241, 236, 233, 22 1, 207, 199, 196, 178, 160, 156, 150, 144, 147, 138, 104, 97, 93, 91, 90, 85, 71, 68, 67, 65, 64, 60, 48, 45, 44, 43, 41, 38, 34, 31, 29, 23, 17, 13, 11, 9, 8, 6, 5, 3, 2, 1, 0]
Trygve Aspenes
@TAlonglong @mraspaud I did a few test-runs with your configuration and start-time (although a bit different area-of-interrest), and found the required graph dimension is ~1.5 times of my original guess-work. Doubling it should be sufficient.
In all my tests the amount of the single-station schedules' permutations weren't this big -- it really seems to be a problem if the antennas are close to each other.
Nevertheless, next week I'll create some situations with 3 or more stations, to see if it still works out ...
Alex
@TAlonglong BTW Trygve, could you please send me your area definition string for "ears_high_res"? Alex
@alexmaul
sure:
REGION: ears_high_res { NAME: Norway - EARS area - 2km PCS_ID: ps60n PCS_DEF: proj=stere,lat_0=90,lon_0=0,lat_ts=60,ellps=WGS84 XSIZE: 4213 YSIZE: 4147 AREA_EXTENT: (-3555026.13, -5805676.35, 4871540.88, 2489256.10) };
Any progress on this ?
Ah, this fix suggested by @alexmaul fixed it. Increasing the graph dimension.
The schedule combine (develop branch) sometimes fails with
if None in gn: Traceback (most recent call last): File "/home/polar/pytroll/bin/schedule", line 9, in
load_entry_point('pytroll-schedule==0.3.1', 'console_scripts', 'schedule')()
File "/home/polar/pytroll/lib/python2.7/site-packages/pytroll_schedule-0.3.1-py2.7.egg/trollsched/schedule.py", line 1003, in run
combined_stations(opts, pattern, station_list, graph, allpasses, start_time, start, forward, center_id)
File "/home/polar/pytroll/lib/python2.7/site-packages/pytroll_schedule-0.3.1-py2.7.egg/trollsched/schedule.py", line 731, in combined_stations
stats, schedule, (newgraph, newpasses) = get_combined_sched(graph, passes)
File "/home/polar/pytroll/lib/python2.7/site-packages/pytroll_schedule-0.3.1-py2.7.egg/trollsched/combine.py", line 297, in get_combined_sched
statlst, newgraph, newpasses = add_graphs(allgraphs, allpasses, delay)
File "/home/polar/pytroll/lib/python2.7/site-packages/pytroll_schedule-0.3.1-py2.7.egg/trollsched/combine.py", line 132, in add_graphs
newgraph.add_arc(newpasses.index(parnode) + 1, newpasses.index(newnode) + 1, w)
File "/home/polar/pytroll/lib/python2.7/site-packages/pytroll_schedule-0.3.1-py2.7.egg/trollsched/graph.py", line 56, in add_arc
self.adj_matrix[v1, v2] = True
IndexError: index 282 is out of bounds for axis 1 with size 282
This results in no combined schedule to be produced.
The scheduler is started by a cronjob at 21.35 UTC each night like this: PYTHONPATH=/home/polar/pytroll/lib/python2.7/site-packages/ /home/polar/metno-software/bin/python /home/polar/pytroll/bin/schedule -c /home/polar/pytroll/etc/oslo-polar-orbit-schedule.cfg --multiproc --metno-xml -o /data/pytroll/schedule/ --tle /data/pytroll/tle/tle-latest.txt --log /data/pytroll/log/schedule.log -v -p >> /data/pytroll/log/schedule-errors.log 2>&1
The metno-xml option is a option to generate xml files on a metno format. It should not influence the combine calculation.
The python version is 2.7.6
tle-20170419.txt
oslo-polar-orbit-schedule.txt wotis.pslwashi.2017109.191145.txt
For the schedule config file and the aqua dump file I needed to change ending to manage to upload it...
Trygve Aspenes