wilzbach / tools-test

1 stars 0 forks source link

OPTLINK crash with large fixed-size array #84

Closed wilzbach closed 7 years ago

wilzbach commented 12 years ago

Note: the issue was created automatically migrated from https://issues.dlang.org

Original bug ID: BZ#8536 From: bearophile_hugs@eml.cc Reported version: D2 CC: bugzilla@digitalmars.com

Duplicates: BZ#6678

wilzbach commented 12 years ago

Comment author: bearophile_hugs@eml.cc

This program:

uint[1 << 24] a; void main() {}

Gives this error: test.d(2): Error: index 16777216 overflow for static array

While this program:

struct Foo { uint x; } Foo[1 << 24] a; void main() {}

Causes an OPTLINK crash.

I sometimes translate to D some C programs that for performance reasons use some large global 2D arrays. In D using a global __gshared dynamic array of dynamic arrays is an option, but this kills some optimizations the compiler is able to perform thanks to knowing the 2D matrix sizes at compile-time. In my opinion asking for 50-100 MB static 2D arrays is not that much for a PC with 2+ GB RAM.

wilzbach commented 12 years ago

Comment author: Walter Bright <bugzilla@digitalmars.com>

This is a well known Optlink bug, though I don't have the bugzilla number handy.

You're wrong about it impeding optimizations compared with dynamically allocating it, for a couple reasons:

  1. static data is often indirectly accessed through a register anyway, either in explicit code generated by the compiler, or implicitly as how the CPU does virtual memory, or even there's no way to do it other than offsetting the program counter register

  2. there is no performance penalty for offsetting a base address register versus and addressing mode with just and address.

D knows the static compile time sizes of arrays if you use static arrays. That's what they're for.

wilzbach commented 12 years ago

Comment author: bearophile_hugs@eml.cc

Created attachment 1138 Three C programs that show one effect of static 2D arrays

Attached file: tests.zip (application/octet-stream, 3006 bytes) Description: Three C programs that show one effect of static 2D arrays

wilzbach commented 12 years ago

Comment author: bearophile_hugs@eml.cc

This is a well known Optlink bug, though I don't have the bugzilla number handy.

OK.

You're wrong about it impeding optimizations compared with dynamically allocating it, for a couple reasons:

This is a discussion better fit for the D newsgroup.

In attach there are 3 nearly identical C programs, that use a 2D global cache matrix to perform a certain simple (but not stupid) computation.

The test0 uses a dynamically allocated "array" of pointers to "arrays". The test1 uses a static array of dynamically allocated rows, and the test2 uses a fully static 2D matrix. Compiling with GCC 4.7.1 with "-std=c99 -Ofast -flto -s" the run-times are 6.52, 6.07 and 4.95 seconds. The more the GCC compiler knows statically about the arrays, the more efficient binary it produces.

wilzbach commented 12 years ago

Comment author: Walter Bright <bugzilla@digitalmars.com>

Your test is incorrectly written.

Use one array, not an array of arrays, and use a macro to compute the r*row+c index.

wilzbach commented 12 years ago

Comment author: bearophile_hugs@eml.cc

This issue has been marked as a duplicate of issue 6678

wilzbach commented 12 years ago

Comment author: bearophile_hugs@eml.cc

Created attachment 1139 Version 4 of the C program

Attached file: test3.zip (application/octet-stream, 1075 bytes) Description: Version 4 of the C program

wilzbach commented 12 years ago

Comment author: bearophile_hugs@eml.cc

Your test is incorrectly written.

Use one array, not an array of arrays, and use a macro to compute the r*row+c index.

Using your suggestions, in attach test3.c run-time is 4.84 seconds.

In D there are no macros, so I think you have to replace:

size_t cache_nc;

define CACHE(r, c) (cache[(r)*cache_nc + (c)])

With something like:

__gshared size_t cache_nc; ref CACHE(in size_t r, in size_t c) nothrow { return cache[r * cache_nc + c]; }

Or maybe use a custom matrix with overloaded [] and avoid global variables (but keep global cache_nc, possibly as an enum, to keep allowing loop unrolling, because many static compilers don't perform unrolling if they don't statically know the loop count. JIT compilers as the Oracle Java one are able to unroll on dynamic values too).