Closed picca closed 3 years ago
I've started to use lists instead of plain variadics, this is why compilation got slower. I'm trying to optimise all this machinery now, including the evaluator itself.
Well, I've done some optimisation. Is it better now, @picca? (Download the latest Metalang99 and Datatype99 commits.)
Instead of 2.5Go of memory per file I now use 1.5Go. So it is an improvement. I would say that this is quite huge for only one datatype. (I removed the other one).
BTW, how do you measure memory consumption?
for now I just look at htop during the compilation of the file
Do you use precompiled headers?
here with /usr/bin/time and only one file
Command being timed: "make"
User time (seconds): 5.78
System time (seconds): 0.49
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.26
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1328556
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 365222
Voluntary context switches: 235
Involuntary context switches: 86
Swaps: 0
File system inputs: 0
File system outputs: 6240
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
the interesting part is Maximum resident set size (kbytes): 1328556
so 1.3Go
I was close ;)
previously
Command being timed: "make"
User time (seconds): 9.47
System time (seconds): 0.80
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.48
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2312980
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 65
Minor (reclaiming a frame) page faults: 608795
Voluntary context switches: 889
Involuntary context switches: 132
Swaps: 0
File system inputs: 83816
File system outputs: 6272
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
so Maximum resident set size (kbytes): 2312980
2.3G
now the version I was happy with :))
Command being timed: "make"
User time (seconds): 2.70
System time (seconds): 0.33
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.02
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 537624
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 156510
Voluntary context switches: 230
Involuntary context switches: 89
Swaps: 0
File system inputs: 0
File system outputs: 4952
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
only 0.5Go
I do not use precompiled headers
Consider using precompiled headers because they will cache the results of datatype(...)
. Also, if you're compiling on GCC, consider -ftrack-macro-expansion=0
.
with -ftrack-macro-expansion=0
, it reduce a lot the memory used (this was with the favorable version)
Command being timed: "make"
User time (seconds): 2.67
System time (seconds): 0.14
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.80
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 92668
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 50882
Voluntary context switches: 229
Involuntary context switches: 72
Swaps: 0
File system inputs: 0
File system outputs: 5048
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
same thing with the lates datattype99 and metalamg99
Command being timed: "make"
User time (seconds): 4.56
System time (seconds): 0.16
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.73
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 202744
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 78937
Voluntary context switches: 229
Involuntary context switches: 245
Swaps: 0
File system inputs: 0
File system outputs: 5144
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
only 200Mo
So it used the double of ressources. (tme and memory)
Well, again I've done some optimisations for lists. Now it's only 0m0,030s
slower than the v0.2.0 version.
Here my result
Command being timed: "make"
User time (seconds): 3.63
System time (seconds): 0.18
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.81
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 166232
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 69181
Voluntary context switches: 231
Involuntary context switches: 134
Swaps: 0
File system inputs: 0
File system outputs: 5048
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
only 166Mo
For your DetectorType
, it now prints
\time -f "%M" gcc playground.c -Imetalang99/include -I. -ftrack-macro-expansion=0 -E
58372
it is better and better :)
Command being timed: "make"
User time (seconds): 3.09
System time (seconds): 0.16
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.25
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 146452
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 65155
Voluntary context switches: 229
Involuntary context switches: 60
Swaps: 0
File system inputs: 0
File system outputs: 5208
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
146Mo
I've optimised pattern matching a bit. At this time, I see nothing to be optimised more, so I'm going to release v0.3.0 now.
here my numbers :)
Command being timed: "make"
User time (seconds): 3.25
System time (seconds): 0.18
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.42
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 146604
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 66311
Voluntary context switches: 230
Involuntary context switches: 30
Swaps: 0
File system inputs: 0
File system outputs: 4952
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
How does it perform now?
Command being timed: "make"
User time (seconds): 2.93
System time (seconds): 0.21
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.12
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 133944
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 63174
Voluntary context switches: 230
Involuntary context switches: 60
Swaps: 0
File system inputs: 0
File system outputs: 4952
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Ping @picca
a lot better :))
Command being timed: "make"
User time (seconds): 2.54
System time (seconds): 0.22
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.75
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 94696
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1
Minor (reclaiming a frame) page faults: 51775
Voluntary context switches: 232
Involuntary context switches: 38
Swaps: 0
File system inputs: 144
File system outputs: 5144
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
So it's almost like https://github.com/Hirrolot/datatype99/issues/5#issuecomment-785189950 (before I started to use lists)?
yes only 2 Mo remaining
@picca, try again, please.
it is a lot better :))
Command being timed: "make"
User time (seconds): 2.60
System time (seconds): 0.15
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.74
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 66956
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 48917
Voluntary context switches: 231
Involuntary context switches: 99
Swaps: 0
File system inputs: 0
File system outputs: 5208
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Nice, even better than the initial version.
I work on a old computer with only 4 Go of memory.
I am just using two datatype
since my last update, the amount of memory use during the compilation exploded. I am wondering if the unroll optimisation is not the culprite ? It was ok with the code of the 14th of Febuary.
Cheers
Fred