sethhall / cld2

Automatically exported from code.google.com/p/cld2
0 stars 0 forks source link

SIGBUS on ARM32 in utf8statetable.cc:517 #11

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I'm trying to get CLD2 working on ARM32 inside of Chromium, cross-compiling 
from a linux x64 host to arm32. The library loads properly, but the following 
crash occurs when calling DetectLanguageSummary:

Program received signal SIGBUS, Bus error.
#0  CLD2::UTF8GenericScan (st=0x61a82104, str=<optimized out>, 
bytes_consumed=0x5f00f88c)
    at ../../third_party/cld_2/src/internal/utf8statetable.cc:518

I'll attach the full trace as a file. Well, minus the Chromium bits. Anyhow, 
the problem appears to be with this snippet of code in utf8statetable.cc:

  // Do fast for groups of 8 identity bytes.
  // This covers a lot of 7-bit ASCII ~8x faster than the 1-byte loop,
  // including slowing slightly on cr/lf/ht
  //----------------------------
  const uint8* Tbl2 = &st->fast_state[0];
  uint32 losub = st->losub;
  uint32 hiadd = st->hiadd;
  while (src < srclimit8) {
    uint32 s0123 = (reinterpret_cast<const uint32 *>(src))[0];
    uint32 s4567 = (reinterpret_cast<const uint32 *>(src))[1];
    src += 8;

Inspecting the pointers in the debugger during the crash, and looking at the 
"src" variable, seems to reveal the problem:
(gdb) p src
$32 = (
    const CLD2::uint8 *) 0x58de4bee "\n\n\n百度一下\n地图贴吧视频图片hao123\n新闻应用音乐文库更多\n小说游戏下载\n把百度放到桌面上,
搜索最方便\n触屏版极速版\nBaidu 京ICP证030173号"

Specifically, src is located at 0x58de4bee. Since this isn't a 4-byte (32-bit) 
aligned address, the SIGBUS presumably comes from trying to read it as a 
uint32*. Many thanks to aberent@chromium.org and anton@chromium.org for the 
help in diagnosing this, I was a bit lost in the weeds looking at my dynamic 
data changes, which turn out to be completely unrelated (this happens with and 
without dynamic data mode).

The suggested workaround for this case is to %4 the address and do a one-off 
scan of the first 0-3 bytes (as necessary), and then descend into the fast 
loop; the concern is that there may be other places in CLD2 that have similar 
behavior and might be time bombs. It might be a good idea to add some memory 
churning code to the unit tests, and then start running the unit tests 
themselves on ARM to further diagnose other problems like this that might arise.

Original issue reported on code.google.com by andrewha...@chromium.org on 20 Mar 2014 at 3:11

Attachments:

GoogleCodeExporter commented 9 years ago
Setting priority to high, as Chromium can't use CLD2 on Android or (presumably) 
iOS until this is resolved.

Original comment by andrewha...@google.com on 20 Mar 2014 at 3:12

GoogleCodeExporter commented 9 years ago
PS, we should fix this for 64-bit while we're here, so replace "%4" in my 
previous comment with "%8".

Original comment by andrewha...@google.com on 20 Mar 2014 at 3:12

GoogleCodeExporter commented 9 years ago
For posterity:
$ cat /proc/cpuinfo | grep ARM
Processor       : ARMv7 Processor rev 10 (v7l)

I'm somewhat surprised by this, because the comments in port.h seem to imply 
that ARMv7+ can do unaligned reads but they're just slow. Not the case 
apparently, as the SIGBUS came from an ARM7 device. Anyhow, I guess we're not 
using port.h here, and if we were it would (irritatingly) probably be broken as 
well. But theoretically we could do UNALIGNED_LOAD32 for this as a short-term 
fix while we get the right thing done. WDYT?

Original comment by andrewha...@google.com on 20 Mar 2014 at 3:25

GoogleCodeExporter commented 9 years ago
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491c/CIHCGCFD.ht
ml
"To enable unaligned support, you must:
Clear the A bit, bit 1, of CP15 register 1 in your initialization code.
Set the U bit, bit 22, of CP15 register 1 in your initialization code."

Original comment by an...@chromium.org on 20 Mar 2014 at 3:33

GoogleCodeExporter commented 9 years ago
$ grep -R -5 "reinterpret_cast" . | grep -5 int32 > 
~/Desktop/unaligned_landmines.txt

Original comment by andrewha...@google.com on 20 Mar 2014 at 3:34

Attachments:

GoogleCodeExporter commented 9 years ago
."/cldutil_shared.cc-// OVERSHOOTS up to 3 bytes"

What could possibly go wrong?

Original comment by an...@chromium.org on 20 Mar 2014 at 3:38

GoogleCodeExporter commented 9 years ago
It looks to me like we need to make repairs in (at least) these spots:
1. UTF8GenericScan in utf8statetable.cc
2. UTF8GenericScanFastAscii in utf8statetable.cc
3. BiHashV2 in cldutil_shared.cc
4. QuadHashV2Mix in cldutil_shared.cc
5. OctaHash40Mix in cldutil_shared.cc

The data loader has the same problem, but it should be immune because the 
tables that are being loaded are already 64-bit aligned by the data extractor. 
Hooray, forward thinking (thanks, Dick!). Wouldn't hurt to drop a note in 
CLD2DynamicDataLoader::loadDataInternal to this effect, though. Dick, do you 
see any other spots that need repairs?

Original comment by andrewha...@google.com on 20 Mar 2014 at 3:41

GoogleCodeExporter commented 9 years ago
Here's a proposed patch that fixes #1. I've tested this on ARM, and it works. 
The unit tests pass, and I threw in some logging (which I took out before 
generating the diff) that shows that the unit tests as-is very thoroughly 
execute the range of values 0-7 as initial offsets. Hooray! This seems 
reasonable to me, but Dick, PTAL.

Original comment by andrewha...@google.com on 21 Mar 2014 at 6:09

Attachments:

GoogleCodeExporter commented 9 years ago
Checking on the remaining functions:
1. UTF8GenericScan in utf8statetable.cc
   Fixed by the patch posted here.

2. UTF8GenericScanFastAscii in utf8statetable.cc
   Not called in any code that I can see. Can we just delete this?

3. BiHashV2 in cldutil_shared.cc
   Uses the UNALIGNED_LOAD macro, and should therefore not have a problem.

4. QuadHashV2Mix in cldutil_shared.cc
   Also uses the UNALIGNED_LOAD macro, and should therefore not have a problem.

5. OctaHash40Mix in cldutil_shared.cc
   Also uses the UNALIGNED_LOAD macro, and should therefore not have a problem.

So I think all we actually need is the patch I've posted here, or a better 
equivalent. Dick, does this sound good?

Original comment by andrewha...@google.com on 21 Mar 2014 at 7:36

GoogleCodeExporter commented 9 years ago
I had a long discussion offline with several subject matter experts about ARM7+ 
support for unaligned access, and the takeaway was that under certain 
circumstances unaligned access on ARM will cause a SIGBUS or other issues even 
if unaligned access is supported and enabled. Their recommendation was that NO 
code should rely upon unaligned access if it's going to run on ARM.

The rest of the code here uses the UNALIGNED_LOAD but it's my impression from 
the comments in utf8statetable.cc that we need to be fast, so I see no obvious 
solution to fix this other than the proposed patch.

Original comment by andrewha...@google.com on 27 Mar 2014 at 10:57

GoogleCodeExporter commented 9 years ago
Dick made a fix to use UNALIGNED_LOAD:
https://code.google.com/p/cld2/source/detail?r=158

Sadly this doens't work. I believe the reason is that port.h incorrectly 
assumes that we don't need aligned loads on ARM7:
https://code.google.com/p/cld2/source/browse/trunk/internal/port.h#57

The crash still occurs in exactly the same place with r158. We either need to 
fix port.h or apply my patch to this method (and the others using 
UNALIGNED_LOAD).

Original comment by andrewha...@google.com on 2 Apr 2014 at 3:43

GoogleCodeExporter commented 9 years ago
Here's a patch that fixes the problem for ARM. I'm dubious that we should be 
doing this at all; we should probably just turn off unaligned access for all of 
ARM and be done with it. Subsequently we can investigate the possibility of 
incrementally killing off the use of UNALIGNED_LOAD in any critical-path code.

Original comment by andrewha...@google.com on 2 Apr 2014 at 3:58

Attachments:

GoogleCodeExporter commented 9 years ago
port.h patch submitted as r159 after consultation with Dick.

Original comment by andrewha...@google.com on 3 Apr 2014 at 7:17