Closed GoogleCodeExporter closed 9 years ago
I do not see the issue in gcc 4.8.1; I see another problem that I missed before
because I did not verify wchar mode:
pugixml.cpp: In function ‘pugi::impl::(anonymous
namespace)::convert_buffer(wchar_t*, unsigned char*, unsigned short*, unsigned
int*, wchar_t const*, unsigned long, pugi::xml_encoding)’:
cc1plus: error: assuming pointer wraparound does not occur when comparing P +-
C1 with P +- C2 [-Werror=strict-overflow]
(note the absence of a line number)
Your warning message looks dangerous though. It seems to suggest that gcc
rewrites the code from
(unsigned)(lead - 0xd800) (cmp) 0x400
to
lead (cmp) 0x400 + 0xD800
which can't be correct. Maybe the generated code is actually correct and it's
just the warning message that's weird. Can you post disassembly for
decode_utf16_block in this build?
Additionally, unless I'm misreading the error message, the signed overflow
can't really occur here (that was true for the previous issue as well).
Is this the only warning?
Can you check if changing the type of 'lead' here:
uint16_t lead = opt_swap::value ? endian_swap(*data) : *data;
To 'unsigned int' fixes the problem?
Original comment by arseny.k...@gmail.com
on 3 Dec 2013 at 7:34
Hey, I won't be able to check until tomorrow (no compiler on this machine), but
I think what might be happening is that "lead" is being promoted to int due to
the subtraction of 0xd800 (which is signed!). You only cast it back to unsigned
after the subtraction, at which point the 'overflow' could have potentially
already occurred (can't check the surrounding code, because again I'm not on
the machine with my code/compilers, but based on the error message that is what
I would assume...).
I'm thinking that perhaps changing the 0xd800 to 0xd800U might solve it too,
but I'll let you know tomorrow.
Thanks for the quick response.
Original comment by raptorfactor@raptorfactor.com
on 3 Dec 2013 at 11:32
Yes, both arguments are promoted to int; however signed overflow can't possibly
occur here, and I expected the compiler to know that since I thought it
performs some form of range tracking for values. Maybe it's not utilized in
this case.
Original comment by arseny.k...@gmail.com
on 3 Dec 2013 at 3:45
Changing 'lead' to 'unsigned int' does indeed fix the problem (didn't try
anything else due to time constraints).
I've pasted below the full disassembly for the function in Release x86. Haven't
had time yet to look at it and figure out if it's correct or not. Probably
won't have time until tonight or tomorrow. I expect you'll beat me to it. :)
Thanks again.
(gdb) disas 0x407f60
Dump of assembler code for function pugi::impl::utf_decoder<pugi::impl::utf16_wr
iter, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned
int, unsigned short*):
0x00407f60 <+0>: push %ebp
0x00407f61 <+1>: push %edi
0x00407f62 <+2>: push %esi
0x00407f63 <+3>: push %ebx
0x00407f64 <+4>: mov 0x14(%esp),%edx
0x00407f68 <+8>: mov 0x18(%esp),%ecx
0x00407f6c <+12>: mov 0x1c(%esp),%eax
0x00407f70 <+16>: lea (%edx,%ecx,2),%esi
0x00407f73 <+19>: cmp %esi,%edx
0x00407f75 <+21>: jae 0x407fe6 <pugi::impl::utf_decoder<pugi::impl::utf
16_writer, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, uns
igned int, unsigned short*)+134>
0x00407f77 <+23>: mov %esi,%esi
0x00407f79 <+25>: lea 0x0(%edi,%eiz,1),%edi
0x00407f80 <+32>: movzwl (%edx),%ecx
0x00407f83 <+35>: cmp $0xd7ff,%cx
0x00407f88 <+40>: jbe 0x407ff0 <pugi::impl::utf_decoder<pugi::impl::utf
16_writer, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, uns
igned int, unsigned short*)+144>
0x00407f8a <+42>: movzwl %cx,%ebx
0x00407f8d <+45>: sub $0xe000,%ebx
0x00407f93 <+51>: cmp $0x1fff,%ebx
0x00407f99 <+57>: jbe 0x407ff0 <pugi::impl::utf_decoder<pugi::impl::utf
16_writer, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, uns
igned int, unsigned short*)+144>
0x00407f9b <+59>: cmp $0xdbff,%cx
0x00407fa0 <+64>: ja 0x408002 <pugi::impl::utf_decoder<pugi::impl::utf
16_writer, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, uns
igned int, unsigned short*)+162>
0x00407fa2 <+66>: lea 0x2(%edx),%ebx
0x00407fa5 <+69>: cmp %ebx,%esi
0x00407fa7 <+71>: jbe 0x408010 <pugi::impl::utf_decoder<pugi::impl::utf
16_writer, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, uns
igned int, unsigned short*)+176>
0x00407fa9 <+73>: movzwl 0x2(%edx),%edi
0x00407fad <+77>: movzwl %di,%ebp
0x00407fb0 <+80>: sub $0xdc00,%ebp
0x00407fb6 <+86>: cmp $0x3ff,%ebp
0x00407fbc <+92>: ja 0x408014 <pugi::impl::utf_decoder<pugi::impl::utf
16_writer, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, uns
igned int, unsigned short*)+180>
0x00407fbe <+94>: and $0x3ff,%ecx
0x00407fc4 <+100>: and $0x3ff,%edi
0x00407fca <+106>: add $0x4,%edx
0x00407fcd <+109>: sub $0x2800,%cx
0x00407fd2 <+114>: sub $0x2400,%di
0x00407fd7 <+119>: add $0x4,%eax
0x00407fda <+122>: mov %cx,-0x4(%eax)
0x00407fde <+126>: mov %di,-0x2(%eax)
0x00407fe2 <+130>: cmp %edx,%esi
0x00407fe4 <+132>: ja 0x407f80 <pugi::impl::utf_decoder<pugi::impl::utf
16_writer, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, uns
igned int, unsigned short*)+32>
0x00407fe6 <+134>: pop %ebx
0x00407fe7 <+135>: pop %esi
0x00407fe8 <+136>: pop %edi
0x00407fe9 <+137>: pop %ebp
0x00407fea <+138>: ret
0x00407feb <+139>: nop
0x00407fec <+140>: lea 0x0(%esi,%eiz,1),%esi
0x00407ff0 <+144>: mov %cx,(%eax)
0x00407ff3 <+147>: add $0x2,%edx
0x00407ff6 <+150>: add $0x2,%eax
0x00407ff9 <+153>: cmp %edx,%esi
0x00407ffb <+155>: ja 0x407f80 <pugi::impl::utf_decoder<pugi::impl::utf
16_writer, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, uns
igned int, unsigned short*)+32>
0x00407ffd <+157>: pop %ebx
0x00407ffe <+158>: pop %esi
0x00407fff <+159>: pop %edi
0x00408000 <+160>: pop %ebp
0x00408001 <+161>: ret
0x00408002 <+162>: add $0x2,%edx
0x00408005 <+165>: jmp 0x407ff9 <pugi::impl::utf_decoder<pugi::impl::utf
16_writer, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, uns
igned int, unsigned short*)+153>
0x00408007 <+167>: mov %esi,%esi
0x00408009 <+169>: lea 0x0(%edi,%eiz,1),%edi
0x00408010 <+176>: mov %ebx,%edx
0x00408012 <+178>: jmp 0x407ff9 <pugi::impl::utf_decoder<pugi::impl::utf
16_writer, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, uns
igned int, unsigned short*)+153>
0x00408014 <+180>: mov %edi,%ecx
0x00408016 <+182>: mov %ebx,%edx
0x00408018 <+184>: jmp 0x407f83 <pugi::impl::utf_decoder<pugi::impl::utf
16_writer, pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, uns
igned int, unsigned short*)+35>
End of assembler dump.
(gdb)
Original comment by raptorfactor@raptorfactor.com
on 3 Dec 2013 at 11:04
I've dumped the disassembly for both the original and the 'fixed' variants of
the function. Sorry, I'm not sure which syntax you prefer so I've used the
Intel syntax this time.
Haven't had a look at it yet, but at least it should be easy to compare now.
Original:
========== START ORIGINAL DUMP ==========
(gdb)disassemble 0x407f60
Dump of assembler code for function
pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) :
0x00407f60 <+0> : push ebp
0x00407f61 <+1> : push edi
0x00407f62 <+2> : push esi
0x00407f63 <+3> : push ebx
0x00407f64 <+4> : mov edx, DWORD PTR[esp + 0x14]
0x00407f68 <+8> : mov ecx, DWORD PTR[esp + 0x18]
0x00407f6c <+12> : mov eax, DWORD PTR[esp + 0x1c]
0x00407f70 <+16> : lea esi, [edx + ecx * 2]
0x00407f73 <+19> : cmp edx, esi
0x00407f75 <+21> : jae 0x407fe6
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 134>
0x00407f77 <+23> : mov esi, esi
0x00407f79 <+25> : lea edi, [edi + eiz * 1 + 0x0]
0x00407f80 <+32> : movzx ecx, WORD PTR[edx]
0x00407f83 <+35> : cmp cx, 0xd7ff
0x00407f88 <+40> : jbe 0x407ff0
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 144>
0x00407f8a <+42> : movzx ebx, cx
0x00407f8d <+45> : sub ebx, 0xe000
0x00407f93 <+51> : cmp ebx, 0x1fff
0x00407f99 <+57> : jbe 0x407ff0
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 144>
0x00407f9b <+59> : cmp cx, 0xdbff
0x00407fa0 <+64> : ja 0x408002
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 162>
0x00407fa2 <+66> : lea ebx, [edx + 0x2]
0x00407fa5 <+69> : cmp esi, ebx
0x00407fa7 <+71> : jbe 0x408010
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 176>
0x00407fa9 <+73> : movzx edi, WORD PTR[edx + 0x2]
0x00407fad <+77> : movzx ebp, di
0x00407fb0 <+80> : sub ebp, 0xdc00
0x00407fb6 <+86> : cmp ebp, 0x3ff
0x00407fbc <+92> : ja 0x408014
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 180>
0x00407fbe <+94> : and ecx, 0x3ff
0x00407fc4 <+100> : and edi, 0x3ff
0x00407fca <+106> : add edx, 0x4
0x00407fcd <+109> : sub cx, 0x2800
0x00407fd2 <+114> : sub di, 0x2400
0x00407fd7 <+119> : add eax, 0x4
0x00407fda <+122> : mov WORD PTR[eax - 0x4], cx
0x00407fde <+126> : mov WORD PTR[eax - 0x2], di
0x00407fe2 <+130> : cmp esi, edx
0x00407fe4 <+132> : ja 0x407f80
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 32>
0x00407fe6 <+134> : pop ebx
0x00407fe7 <+135> : pop esi
0x00407fe8 <+136> : pop edi
0x00407fe9 <+137> : pop ebp
0x00407fea <+138> : ret
0x00407feb <+139> : nop
0x00407fec <+140> : lea esi, [esi + eiz * 1 + 0x0]
0x00407ff0 <+144> : mov WORD PTR[eax], cx
0x00407ff3 <+147> : add edx, 0x2
0x00407ff6 <+150> : add eax, 0x2
0x00407ff9 <+153> : cmp esi, edx
0x00407ffb <+155> : ja 0x407f80
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 32>
0x00407ffd <+157> : pop ebx
0x00407ffe <+158> : pop esi
0x00407fff <+159> : pop edi
0x00408000 <+160> : pop ebp
0x00408001 <+161> : ret
0x00408002 <+162> : add edx, 0x2
0x00408005 <+165> : jmp 0x407ff9
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 153>
0x00408007 <+167> : mov esi, esi
0x00408009 <+169> : lea edi, [edi + eiz * 1 + 0x0]
0x00408010 <+176> : mov edx, ebx
0x00408012 <+178> : jmp 0x407ff9
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 153>
0x00408014 <+180> : mov ecx, edi
0x00408016 <+182> : mov edx, ebx
0x00408018 <+184> : jmp 0x407f83
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 35>
End of assembler dump.
(gdb)
========== STOP ORIGINAL DUMP ==========
========== START FIXED DUMP ==========
(gdb)disas 0x407f60
Dump of assembler code for function
pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsignedint,
unsigned short*) :
0x00407f60 <+0> : push ebp
0x00407f61 <+1> : push edi
0x00407f62 <+2> : push esi
0x00407f63 <+3> : push ebx
0x00407f64 <+4> : mov edx, DWORD PTR[esp + 0x14]
0x00407f68 <+8> : mov ecx, DWORD PTR[esp + 0x18]
0x00407f6c <+12> : mov eax, DWORD PTR[esp + 0x1c]
0x00407f70 <+16> : lea esi, [edx + ecx * 2]
0x00407f73 <+19> : cmp edx, esi
0x00407f75 <+21> : jae 0x407fff
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 159>
0x00407f7b <+27> : nop
0x00407f7c <+28> : lea esi, [esi + eiz * 1 + 0x0]
0x00407f80 <+32> : movzx ebx, WORD PTR[edx]
0x00407f83 <+35> : movzx ecx, bx
0x00407f86 <+38> : cmp ecx, 0xd7ff
0x00407f8c <+44> : jbe 0x408004
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 164>
0x00407f8e <+46> : lea edi, [ecx - 0xe000]
0x00407f94 <+52> : cmp edi, 0x1fff
0x00407f9a <+58> : jbe 0x408004
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 164>
0x00407f9c <+60> : sub ecx, 0xd800
0x00407fa2 <+66> : cmp ecx, 0x3ff
0x00407fa8 <+72> : ja 0x408020
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 192>
0x00407faa <+74> : lea ecx, [edx + 0x2]
0x00407fad <+77> : cmp esi, ecx
0x00407faf <+79> : jbe 0x408025
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 197>
0x00407fb1 <+81> : movzx edi, WORD PTR[edx + 0x2]
0x00407fb5 <+85> : movzx ebp, di
0x00407fb8 <+88> : sub ebp, 0xdc00
0x00407fbe <+94> : cmp ebp, 0x3ff
0x00407fc4 <+100> : ja 0x408030
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 208>
0x00407fc6 <+102> : and ebx, 0x3ff
0x00407fcc <+108> : and edi, 0x3ff
0x00407fd2 <+114> : add edx, 0x4
0x00407fd5 <+117> : shl ebx, 0xa
0x00407fd8 <+120> : add eax, 0x4
0x00407fdb <+123> : lea ecx, [edi + ebx * 1]
0x00407fde <+126> : mov ebx, ecx
0x00407fe0 <+128> : and ecx, 0x3ff
0x00407fe6 <+134> : shr ebx, 0xa
0x00407fe9 <+137> : sub cx, 0x2400
0x00407fee <+142> : sub bx, 0x2800
0x00407ff3 <+147> : mov WORD PTR[eax - 0x2], cx
0x00407ff7 <+151> : mov WORD PTR[eax - 0x4], bx
0x00407ffb <+155> : cmp esi, edx
0x00407ffd <+157> : ja 0x407f80
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 32>
0x00407fff <+159> : pop ebx
0x00408000 <+160> : pop esi
0x00408001 <+161> : pop edi
0x00408002 <+162> : pop ebp
0x00408003 <+163> : ret
0x00408004 <+164> : mov WORD PTR[eax], bx
0x00408007 <+167> : add edx, 0x2
0x0040800a <+170> : add eax, 0x2
0x0040800d <+173> : cmp esi, edx
0x0040800f <+175> : ja 0x407f80
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 32>
0x00408015 <+181> : pop ebx
0x00408016 <+182> : pop esi
0x00408017 <+183> : pop edi
0x00408018 <+184> : pop ebp
0x00408019 <+185> : ret
0x0040801a <+186> : lea esi, [esi + 0x0]
0x00408020 <+192> : add edx, 0x2
0x00408023 <+195> : jmp 0x40800d
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 173>
0x00408025 <+197> : mov edx, ecx
0x00408027 <+199> : jmp 0x40800d
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 173>
0x00408029 <+201> : lea esi, [esi + eiz * 1 + 0x0]
0x00408030 <+208> : mov ebx, edi
0x00408032 <+210> : mov edx, ecx
0x00408034 <+212> : jmp 0x407f83
<pugi::impl::utf_decoder<pugi::impl::utf16_writer,
pugi::impl::opt_false>::decode_utf16_block(unsigned short const*, unsigned int,
unsigned short*) + 35>
End of assembler dump.
(gdb)
========== STOP FIXED DUMP ==========
Original comment by raptorfactor@raptorfactor.com
on 4 Dec 2013 at 3:07
Disassembly looks the same (modulo instruction reordering/extra nop-s) and
correct, so I guess it's just a badly worded warning.
It's weird though that the warning only appears in one function despite the
fact that a number of places use the same pattern. I'll try to build gcc 4.9 to
test this myself just to make sure that all configurations build cleanly with
-Wstrict-overflow.
Original comment by arseny.k...@gmail.com
on 4 Dec 2013 at 3:28
Finally built gcc 4.9 and verified the change. Fixed in r957.
Original comment by arseny.k...@gmail.com
on 20 Dec 2013 at 8:27
Thanks!
Original comment by raptorfactor@raptorfactor.com
on 20 Dec 2013 at 9:05
Original issue reported on code.google.com by
raptorfactor@raptorfactor.com
on 3 Dec 2013 at 5:32