pytroll / pytroll-schedule

Reception scheduling of polar weather satellites
http://pytroll-schedule.readthedocs.org/
GNU General Public License v3.0
9 stars 12 forks source link

Schedule combine (develop branch) sometimes failes. #6

Closed TAlonglong closed 5 years ago

TAlonglong commented 7 years ago

The schedule combine (develop branch) sometimes fails with

if None in gn: Traceback (most recent call last): File "/home/polar/pytroll/bin/schedule", line 9, in load_entry_point('pytroll-schedule==0.3.1', 'console_scripts', 'schedule')() File "/home/polar/pytroll/lib/python2.7/site-packages/pytroll_schedule-0.3.1-py2.7.egg/trollsched/schedule.py", line 1003, in run combined_stations(opts, pattern, station_list, graph, allpasses, start_time, start, forward, center_id) File "/home/polar/pytroll/lib/python2.7/site-packages/pytroll_schedule-0.3.1-py2.7.egg/trollsched/schedule.py", line 731, in combined_stations stats, schedule, (newgraph, newpasses) = get_combined_sched(graph, passes) File "/home/polar/pytroll/lib/python2.7/site-packages/pytroll_schedule-0.3.1-py2.7.egg/trollsched/combine.py", line 297, in get_combined_sched statlst, newgraph, newpasses = add_graphs(allgraphs, allpasses, delay) File "/home/polar/pytroll/lib/python2.7/site-packages/pytroll_schedule-0.3.1-py2.7.egg/trollsched/combine.py", line 132, in add_graphs newgraph.add_arc(newpasses.index(parnode) + 1, newpasses.index(newnode) + 1, w) File "/home/polar/pytroll/lib/python2.7/site-packages/pytroll_schedule-0.3.1-py2.7.egg/trollsched/graph.py", line 56, in add_arc self.adj_matrix[v1, v2] = True IndexError: index 282 is out of bounds for axis 1 with size 282

This results in no combined schedule to be produced.

The scheduler is started by a cronjob at 21.35 UTC each night like this: PYTHONPATH=/home/polar/pytroll/lib/python2.7/site-packages/ /home/polar/metno-software/bin/python /home/polar/pytroll/bin/schedule -c /home/polar/pytroll/etc/oslo-polar-orbit-schedule.cfg --multiproc --metno-xml -o /data/pytroll/schedule/ --tle /data/pytroll/tle/tle-latest.txt --log /data/pytroll/log/schedule.log -v -p >> /data/pytroll/log/schedule-errors.log 2>&1

The metno-xml option is a option to generate xml files on a metno format. It should not influence the combine calculation.

The python version is 2.7.6

tle-20170419.txt

oslo-polar-orbit-schedule.txt wotis.pslwashi.2017109.191145.txt

For the schedule config file and the aqua dump file I needed to change ending to manage to upload it...

Trygve Aspenes

mraspaud commented 7 years ago

@TAlonglong thanks for the bug report. @alexmaul have you ever seen this ?

alexmaul commented 7 years ago

@TAlonglong @mraspaud I had this error once, but could not reproduce it back then. I'll look into it and try to find the reason ... might be next week though.

TAlonglong commented 7 years ago

@mraspaud and @alexmaul Just a few more details.

Just before the crash the log says:

[INFO: 2017-04-19 21:53:08 : trollsched] Generating coordinated schedules ... [DEBUG: 2017-04-19 21:53:08 : trollsched] station: oslo-x, order: 97 [DEBUG: 2017-04-19 21:53:08 : trollsched] station: oslo-l, order: 43 [DEBUG: 2017-04-19 21:53:08 : trollsched] newgraph order: 282

This is calculated in add_graphs in combine.py

# Rough estimate for the size of the combined passes' graph.                                                                                         
n_vertices = 1
for g in grl:
    n_vertices += g.order
n_vertices *= len(statlst)
newgraph = Graph(n_vertices=n_vertices)

logger.debug("newgraph order: %d", newgraph.order)

I don't follow the numbers here, but obviously(?) the estimate is to low?

Trygve Aspenes

alexmaul commented 7 years ago

@TAlonglong Quick suggestion before I leave for a meeting ...

Could you change the graph dimension into n_vertices *= len(statlst) * 2 doubling the size?

"By the book" one would multiply all single-pass-graphs' dimensions, but with three ore more stations the dimension of the combined-graph would become astronomical, so I saved a lot of unused memory by "estimating" ... a bit too tight, I'd say ... although in all my tests that estimate was sufficient.

TAlonglong commented 7 years ago

@alexmaul @mraspaud I did a rerun with simliar conditions as yesterday crontab run and it failed as expected.

I implemented the fix suggested and this time it worked fine. Its worth noting the original newfrap order was 282, now doubled to 564. But the needed space was 308.

So lets hope this will turn out sufficient.

Output from run with new fix: [INFO: 2017-04-20 13:51:48 : trollsched] Generating coordinated schedules ... [DEBUG: 2017-04-20 13:51:48 : trollsched] station: oslo-x, order: 97 [DEBUG: 2017-04-20 13:51:48 : trollsched] station: oslo-l, order: 43 [DEBUG: 2017-04-20 13:51:48 : trollsched] newgraph order: 564 [DEBUG: 2017-04-20 13:51:55 : trollsched] newpasses length: 308 [DEBUG: 2017-04-20 13:51:55 : trollsched] Distance: -8 [DEBUG: 2017-04-20 13:51:55 : trollsched] Path through newpasses: [308, 306, 303, 300, 295, 292, 290, 288, 282, 274, 271, 265, 250, 241, 236, 233, 22 1, 207, 199, 196, 178, 160, 156, 150, 144, 147, 138, 104, 97, 93, 91, 90, 85, 71, 68, 67, 65, 64, 60, 48, 45, 44, 43, 41, 38, 34, 31, 29, 23, 17, 13, 11, 9, 8, 6, 5, 3, 2, 1, 0]

Trygve Aspenes

alexmaul commented 7 years ago

@TAlonglong @mraspaud I did a few test-runs with your configuration and start-time (although a bit different area-of-interrest), and found the required graph dimension is ~1.5 times of my original guess-work. Doubling it should be sufficient.

In all my tests the amount of the single-station schedules' permutations weren't this big -- it really seems to be a problem if the antennas are close to each other.

Nevertheless, next week I'll create some situations with 3 or more stations, to see if it still works out ...

Alex

alexmaul commented 7 years ago

@TAlonglong BTW Trygve, could you please send me your area definition string for "ears_high_res"? Alex

TAlonglong commented 7 years ago

@alexmaul

sure:

REGION: ears_high_res { NAME: Norway - EARS area - 2km PCS_ID: ps60n PCS_DEF: proj=stere,lat_0=90,lon_0=0,lat_ts=60,ellps=WGS84 XSIZE: 4213 YSIZE: 4147 AREA_EXTENT: (-3555026.13, -5805676.35, 4871540.88, 2489256.10) };

mraspaud commented 5 years ago

Any progress on this ?

TAlonglong commented 5 years ago

Ah, this fix suggested by @alexmaul fixed it. Increasing the graph dimension.