thu-pacman / GeminiGraph

A computation-centric distributed graph processing system.
Apache License 2.0
316 stars 129 forks source link

Application crash with big input file #11

Closed fhoushmand closed 5 years ago

fhoushmand commented 5 years ago

Hello,

I'm trying to run bfs program on twitter2010 input file. I've already converted the file to binary format. Here is the command that I use to run it on 8 nodes on a cluster:

mpirun -n 8 -ppn 32 ./bfs binary-input 41652230 1

But, the program keeps crashing with the following crash report (assertion false when getting partition id): thread support level provided by MPI: MPI_THREAD_MULTIPLE |V| = 41652230, |E| = 1468365182 |V'_0| = 2871296 |E^dense_0| = 314202346 |V'_1_0| = 782336 |E^dense_1_0| = 68485950 |V'_1_1| = 417792 |E^dense_1_1| = 95272979 |V'_2_0| = 552960 |E^dense_2_0| = 87563884 |V'_1| = 2260992 |E^dense_1| = 348123055 |V'_1_2| = 282624 |E^dense_1_2| = 105132766 |V'_1_3| = 778240 |E^dense_1_3| = 79231360 |V'_2_1| = 745472 |E^dense_2_1| = 76757359 |V'_3_0| = 1236992 |E^dense_3_0| = 49314395 |V'_2| = 3674112 |E^dense_2| = 269440472 |V'_2_2| = 1200128 |E^dense_2_2| = 51701577 |V'_4_0| = 1421312 |E^dense_4_0| = 38844553 |V'_5_0| = 1556480 |E^dense_5_0| = 31613706 |V'_2_3| = 1175552 |E^dense_2_3| = 53417652 |V'_3_1| = 1318912 |E^dense_3_1| = 44771383 |V'_6_0| = 1814528 |E^dense_6_0| = 17051275 |V'_7_0| = 1921024 |E^dense_7_0| = 11060460 |V'_4_1| = 1380352 |E^dense_4_1| = 41268721 |V'_5_1| = 1642496 |E^dense_5_1| = 26588527 |V'_3_2| = 1290240 |E^dense_3_2| = 46229979 |V'_3| = 5206016 |E^dense_3| = 183329976 |V'_3_3| = 1359872 |E^dense_3_3| = 43014219 |V'_6_1| = 1835008 |E^dense_6_1| = 16113070 |V'_7_1| = 1912832 |E^dense_7_1| = 11805510 |V'_4_2| = 1507328 |E^dense_4_2| = 34311730 |V'_5_2| = 1740800 |E^dense_5_2| = 21402872 |V'_5_3| = 1757184 |E^dense_5_3| = 20574020 |V'_4_3| = 1552384 |E^dense_4_3| = 32206979 |V'_7_2| = 1908736 |E^dense_7_2| = 12035589 |V'_6_2| = 1843200 |E^dense_6_2| = 15618443 |V'_4| = 5861376 |E^dense_4| = 146631983 |V'_7_3| = 1953798 |E^dense_7_3| = 9622540 |V'_6_3| = 1892352 |E^dense_6_3| = 13151337 |V'_5| = 6696960 |E^dense_5| = 100179125 |V'_6| = 7385088 |E^dense_6| = 61934125 |V'_7| = 7696390 |E^dense_7| = 44524099 |V'_0_0| = 290816 |E^dense_0_0| = 102384936 |V'_0_1| = 749568 |E^dense_0_1| = 73477386 |V'_0_2| = 778240 |E^dense_0_2| = 74985272 |V'_0_3| = 1052672 |E^dense_0_3| = 63354752 machine(0) got 282171224 sparse mode edges machine(1) got 366424013 sparse mode edges machine(2) got 283360301 sparse mode edges machine(7) got 77415901 sparse mode edges machine(4) got 151005424 sparse mode edges machine(6) got 42193162 sparse mode edges machine(3) got 199526445 sparse mode edges machine(5) got 66268712 sparse mode edges part(6) E_0 has 9226041 sparse mode edges part(0) E_0 has 70879848 sparse mode edges part(1) E_0 has 66134242 sparse mode edges part(7) E_0 has 20213634 sparse mode edges part(5) E_0 has 22681242 sparse mode edges part(2) E_0 has 99424748 sparse mode edges part(3) E_0 has 56731545 sparse mode edges part(4) E_0 has 45628992 sparse mode edges part(6) E_1 has 8678768 sparse mode edges part(0) E_1 has 84465081 sparse mode edges part(1) E_1 has 97583787 sparse mode edges part(7) E_1 has 29199875 sparse mode edges part(5) E_1 has 17777695 sparse mode edges part(3) E_1 has 41391694 sparse mode edges part(2) E_1 has 62480568 sparse mode edges part(4) E_1 has 37207897 sparse mode edges part(6) E_2 has 9294606 sparse mode edges part(0) E_2 has 59707870 sparse mode edges part(7) E_2 has 18791129 sparse mode edges part(1) E_2 has 116830284 sparse mode edges part(5) E_2 has 14059701 sparse mode edges part(3) E_2 has 52855250 sparse mode edges part(2) E_2 has 61966627 sparse mode edges part(4) E_2 has 38301424 sparse mode edges part(7) E_3 has 9211263 sparse mode edges part(6) E_3 has 14993747 sparse mode edges part(0) E_3 has 67118425 sparse mode edges part(1) E_3 has 85875700 sparse mode edges part(5) E_3 has 11750074 sparse mode edges part(2) E_3 has 59488358 sparse mode edges part(4) E_3 has 29867110 sparse mode edges part(3) E_3 has 48547956 sparse mode edges number of partitions: 8, vertex_id: 41652230 bfs: /home/GeminiGraph/core/graph.hpp:339: int Graph::get_partition_id(VertexId) [with EdgeData = Empty; VertexId = unsigned int]: Assertion `false' failed.

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 29421 RUNNING AT node01 = EXIT CODE: 6 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions

*note: The program runs correctly for smaller input (LiveJournal).

Any help would be greatly appreciated. Thanks

coolerzxw commented 5 years ago

Hi, this error usually happens when the max vid in the input is greater than or equal to (>=) the given number of vertices parameter. You may check whether this is your case as well.

fhoushmand commented 5 years ago

That was the case for error. Thanks for the help.