limbo018 / DREAMPlace

Deep learning toolkit-enabled VLSI placement
BSD 3-Clause "New" or "Revised" License
717 stars 207 forks source link

fail to launch rc tree construction #192

Closed JeffreyzzZ0 closed 3 weeks ago

JeffreyzzZ0 commented 1 month ago

Dear Prof Lin @limbo018 , Sorry to bother you, When I use DREAMPace to do timing-driven global placement, I met the error that DREAMPlace failed to launch rc tree construction and lead to a segmentation fault. I have set the configuration for timing-driven placement, and I cant figure out what actually happend. Here are my configuration and log. Can you give me some suggestions how to deal with it ,Thank you! image image

JeffreyzzZ0 commented 1 month ago

the error seems come from here in the path DREAMPlace/dreamplace/ops/timing/src/timing_cpp.cpp ,what should I change ? image

enzoleo commented 1 month ago

The segmentation fault comes from FLUTE. Based on my experience, typically it is because some input files cannot be found or they are in an invalid format. You can check whether the LUT paths are correct first. If they are correct and FLUTE succeeds to read the LUTs, which means the error occurs inside FLUTE, you'd better use some debugging tools to find where the error comes from exactly.

By the way, may I know whether you are working on ICCAD 2015 benchmarks? If not, does this error occur on ICCAD 2015 benchmarks?

JeffreyzzZ0 commented 1 month ago

The segmentation fault comes from FLUTE. Based on my experience, typically it is because some input files cannot be found or they are in an invalid format. You can check whether the LUT paths are correct first. If they are correct and FLUTE succeeds to read the LUTs, which means the error occurs inside FLUTE, you'd better use some debugging tools to find where the error comes from exactly.

By the way, may I know whether you are working on ICCAD 2015 benchmarks? If not, does this error occur on ICCAD 2015 benchmarks?

Thank you for your reply. I have changed the LUT paths to absolute paths and rebuilt DREAMPlace, but the issue still persists. The LUT files are included in the DREAMPlace source code, and I have not made any changes to them. I am not working on ICCAD 2015 benchmarks; I am simply trying to use a simple case for timing-driven placement. Is there any possibility that somewhere else could be wrong?

enzoleo commented 1 month ago

It is very likely you give some invalid input files. Looks like it goes well without timing opt, so you may need to check your .lib and .sdc files. Make sure they have the same format as the examples of ICCAD 2015.

JeffreyzzZ0 commented 1 month ago

It is very likely you give some invalid input files. Looks like it goes well without timing opt, so you may need to check your .lib and .sdc files. Make sure they have the same format as the examples of ICCAD 2015.

thank you very much , I will have a try

JeffreyzzZ0 commented 1 month ago

It is very likely you give some invalid input files. Looks like it goes well without timing opt, so you may need to check your .lib and .sdc files. Make sure they have the same format as the examples of ICCAD 2015.

I tried to work on the ICCAD 2015 benchmarks, the Segmentation fault persists

enzoleo commented 1 month ago

You mean you cannot even make a complete run on ICCAD 2015 benchmarks? That is really weird. Are your benchmarks downloaded from the link provided in benchmarks/iccad2015.ot.md?

JeffreyzzZ0 commented 1 month ago

You mean you cannot even make a complete run on ICCAD 2015 benchmarks? That is really weird. Are your benchmarks downloaded from the link provided in benchmarks/iccad2015.ot.md?

Exactly, I think the timing_cpp.cpython-38-x86_64-linux-gnu.so was compiled successfully, maybe I need to debug python file and .so file together.

enzoleo commented 1 month ago

My own experience suggests that it is because flute failed to read the files or OpenTimer failed to parse timing inputs. Since it's too hard for me to reproduce the error, I am afraid I could not offer too much help. If possible, you can use tools like gdb and valgrind or simply print some debugging messages to find what is going on inside that function.

JeffreyzzZ0 commented 3 weeks ago

My own experience suggests that it is because flute failed to read the files or OpenTimer failed to parse timing inputs. Since it's too hard for me to reproduce the error, I am afraid I could not offer too much help. If possible, you can use tools like gdb and valgrind or simply print some debugging messages to find what is going on inside that function.

I'm sorry for the delay due to other matters and forgetting to reply to you. Now I'm back to this problem and have located it, the FLUTE can successfully read lut files but when it do for loops,when d=7, k=5585 ,it stopped.I analysed the core dumped,It says Program terminated with signal 11, Segmentation fault.

0 0x00007f975eadf9c5 in _int_malloc () from /usr/lib64/libc.so.6, Do you have any idea about it? thank you

image

enzoleo commented 3 weeks ago

I don't have a clue. Is it possible for you to share your cases with me?

enzoleo commented 3 weeks ago

I just noticed that this error occurred even when you are running the program on ICCAD 2015 benchmarks. I really don't have a clue why it does not work if you did not commit any change :cry:. One way is just as what I said, you share the new case with me and let me see whether I can successfully go through with it.

JeffreyzzZ0 commented 3 weeks ago

I just noticed that this error occurred even when you are running the program on ICCAD 2015 benchmarks. I really don't have a clue why it does not work if you did not commit any change 😢. One way is just as what I said, you share the new case with me and let me see whether I can successfully go through with it.

thank you for you help, I just clean the env variables , re clone the repo and rebuild it . It can successfully run the ICCAD benchmarks, it is quite wired but finally work!

enzoleo commented 3 weeks ago

@JeffreyzzZ0 Glad to hear that.

By the way, from the first screenshot you provided in the issue description, I found that it converged very fast on your own case. The overflow had already achieved 0.068 at Iteration 509, so it might be too late to optimize timing. You can try modifying the code to start timing optimization earlier. It should be beneficial to your results.