lac-dcc / Angha

Angha is a framework for constructing compilable synthetic C benchmarks out of publicly available C repositories.
GNU Lesser General Public License v2.1
22 stars 3 forks source link

Expected execution times of Angha's function extractor #1

Closed jordiae closed 3 years ago

jordiae commented 3 years ago

Hi! Thanks for your work. I was taking a look at your function extractor and was wondering whether you had any rough estimates on its speed. Currently, I'm using pycparser (https://github.com/eliben/pycparser) for a similar purpose and it is considerably slow for large files. Thanks!

pronesto commented 3 years ago

Hi Jordi,

We have some idea of the time that it takes to build a compilable function. That includes the time to parse the C program, but the bulk of the time is spent on type reconstruction. See this report:

http://lac.dcc.ufmg.br/pubs/TechReports/LaC_TechReport012020.pdf

"To reach the threshold of 1,000,000 programs, our framework collected code from 148 repositories. The exact number of compilable programs generated was 1,044,023, and the entire process took approximately 145 hours to terminate."

Leandro run a similar experiment to test psyche-C. Take a look into Section 7.2 of this paper:

https://homepages.dcc.ufmg.br/~fernando/publications/papers/LeandroTOPLAS20.pdf

Kind regards,

Fernando

On Mon, Nov 1, 2021 at 8:15 PM Jordi Armengol Estapé < @.***> wrote:

Hi! Thanks for your work. I was taking a look at your function extractor and was wondering whether you had any rough estimates on its speed. Currently, I'm using pycparser (https://github.com/eliben/pycparser) for a similar purpose and it is considerably slow for large files. Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lac-dcc/Angha/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACZZZU64BJRE7YSTBCJXOLUJ4NQTANCNFSM5HFD5SZA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jordiae commented 3 years ago

Hi Jordi, We have some idea of the time that it takes to build a compilable function. That includes the time to parse the C program, but the bulk of the time is spent on type reconstruction. See this report: http://lac.dcc.ufmg.br/pubs/TechReports/LaC_TechReport012020.pdf "To reach the threshold of 1,000,000 programs, our framework collected code from 148 repositories. The exact number of compilable programs generated was 1,044,023, and the entire process took approximately 145 hours to terminate." Leandro run a similar experiment to test psyche-C. Take a look into Section 7.2 of this paper: https://homepages.dcc.ufmg.br/~fernando/publications/papers/LeandroTOPLAS20.pdf Kind regards, Fernando

Thanks! Do you have the specs of the machine that needed 145 hours?

pronesto commented 3 years ago

I need to confirm with Breno (cc'ed), but I think it was an Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz.

@Breno: could you confirm that you did that experiment to measure Angha's throughput using Hokusai? Did you use multiple cores, or was it a single core experiment?

Regards,

Fernando

On Tue, Nov 2, 2021 at 7:55 AM Jordi Armengol Estapé < @.***> wrote:

Hi Jordi, We have some idea of the time that it takes to build a compilable function. That includes the time to parse the C program, but the bulk of the time is spent on type reconstruction. See this report: http://lac.dcc.ufmg.br/pubs/TechReports/LaC_TechReport012020.pdf "To reach the threshold of 1,000,000 programs, our framework collected code from 148 repositories. The exact number of compilable programs generated was 1,044,023, and the entire process took approximately 145 hours to terminate." Leandro run a similar experiment to test psyche-C. Take a look into Section 7.2 of this paper: https://homepages.dcc.ufmg.br/~fernando/publications/papers/LeandroTOPLAS20.pdf Kind regards, Fernando

Thanks! Do you have the specs of the machine that needed 145 hours?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lac-dcc/Angha/issues/1#issuecomment-957330070, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACZZZQX2QY7ZBR7WLEK66LUJ67STANCNFSM5HFD5SZA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

brenocfg commented 3 years ago

Yes, that's correct. It was a multiple core experiment, so in general all 8 cores were in use for the entire experiment.

Note that the experiment's runtime includes the time for the constraint solver to perform type reconstruction, which is a significant portion of the total execution time. If we were to consider only function extraction, I suspect the runtime would be considerably shorter!

jordiae commented 3 years ago

Yes, that's correct. It was a multiple core experiment, so in general all 8 cores were in use for the entire experiment.

Note that the experiment's runtime includes the time for the constraint solver to perform type reconstruction, which is a significant portion of the total execution time. If we were to consider only function extraction, I suspect the runtime would be considerably shorter!

Understood, thanks!