xiongxinbobo / libnfc

Automatically exported from code.google.com/p/libnfc
0 stars 0 forks source link

SCL3711 (pn533_usb) stopped working after some commands #114

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Put a DESFire on the PN533 device
2. download/build libfreefare
3. "make check" to run the test suite

What is the expected output?
......OOOOOOOOOOOOOOOOO..................
[...]
24 test(s), 984 assertion(s), 0 failure(s), 0 error(s), 0 pending(s), 17 
omission(s), 0 notification(s)
100% passed

What do you see instead?
......OOOOOOOOOOOOOOOOO..........cutter: ERROR: Unexpected PN53x reply!
OOO.....
[...]
21 test(s), 607 assertion(s), 0 failure(s), 0 error(s), 0 pending(s), 20 
omission(s), 0 notification(s)
100% passed

Note: after the ERROR, device stopped working and the next O (Omissions) are 
explained by:

18) Omission: test_mifare_desfire
nfc_connect() failed
./mifare_desfire_fixture.c:42: cut_setup(): cut_omit("nfc_connect() failed")

Original issue reported on code.google.com by romu...@libnfc.org on 27 Sep 2010 at 1:59

GoogleCodeExporter commented 9 years ago
I can't reproduce this with FreeBSD's libusb.

Tested with a Mifare Classic 4K + Ultralight on a touchatag and DESFire 4k on 
PN533 USB.

Might be OS / libusb implementation specific.

romain@marvin ...eefare/work/libfreefare-0.2.0 % make check                     

Making check in contrib
Making check in libutil
Making check in libfreefare
Making check in test
make  check-TESTS
.........................................

Finished in 33,601763 seconds (total: 22,948430 seconds)

41 test(s), 1206 assertion(s), 0 failure(s), 0 error(s), 0 pending(s), 0 
omission(s), 0 notification(s)
100% passed
PASS: run-test.sh
=============
1 test passed
=============
Making check in examples
romain@marvin ...eefare/work/libfreefare-0.2.0 % lsnfc                          

device = ACS ACR 38U-CCID 01 00 / ACR122U102 - PN532 v1.4 (0x07)
  ISO14443A: Unknown ISO14443A tag type: ATQA (SENS_RES): 0000, UID (NFCID1): bce01047, SAK (SEL_RES): 18
  ISO14443A: NXP MIFARE UltraLight (UID=04999c996d0280)
2 tag(s) on device.

device = PN533_USB - PN533 v2.7 (0x07)
  ISO14443A: NXP MIFARE DESFire (UID=041e2771d21b80)
1 tag(s) on device.

Total: 3 tag(s) on 2 device(s).
romain@marvin ...eefare/work/libfreefare-0.2.0 %                                

romain@marvin ...eefare/work/libfreefare-0.2.0 % nfc-list                       

nfc-list use libnfc 1.3.9 (r629M)
Connected to NFC reader: ACS ACR 38U-CCID 01 00 / ACR122U102 - PN532 v1.4 (0x07)
2 ISO14443A passive target(s) was found:
    ATQA (SENS_RES): 00  00  
       UID (NFCID1): bc  e0  10  47  
      SAK (SEL_RES): 18  

    ATQA (SENS_RES): 00  44  
       UID (NFCID1): 04  99  9c  99  6d  02  80  
      SAK (SEL_RES): 00  

0 Felica (212 kbps) passive target(s) was found.

0 Felica (424 kbps) passive target(s) was found.

0 ISO14443B passive target(s) was found.

Connected to NFC reader: PN533_USB - PN533 v2.7 (0x07)
1 ISO14443A passive target(s) was found:
    ATQA (SENS_RES): 03  44  
       UID (NFCID1): 04  1e  27  71  d2  1b  80  
      SAK (SEL_RES): 20  
          ATS (ATR): 75  77  81  02  80  
     Compliant with: ISO/IEC 14443-4 

0 Felica (212 kbps) passive target(s) was found.

0 Felica (424 kbps) passive target(s) was found.

0 ISO14443B passive target(s) was found.

Original comment by romain.t...@gmail.com on 27 Sep 2010 at 9:18

GoogleCodeExporter commented 9 years ago
I can reproduced this bug with:
 - Debian Squeeze (amd64)
 - Kubuntu Maverick (amd64)
 - Kubuntu Lucid (i386)

Always with a DESFire tag on the device.

Original comment by romu...@libnfc.org on 28 Sep 2010 at 9:16

GoogleCodeExporter commented 9 years ago
I can reproduce this on Debian i686 5.0.5 (linux-2.6.26-2) too.  With the same 
NFC device on the same machine with FreeBSD all is smooth.  This is maybe a bug 
in the libusb or in the kernel's USB stack, tying with e.g. NetBSD would be 
interesting since it does not ship with it's own libusb and share the 
implementation with the one used by linux, while relying on another USB stack 
at the kernel level: if it works, the bug should be in the Linux USB stack.

FYI, I tried adding a delay at the beginning of the DESFIRE_TRANSCEIVE macro 
(who knows) and still got the problem.

Original comment by romain.t...@gmail.com on 28 Sep 2010 at 9:31

GoogleCodeExporter commented 9 years ago
The error is caused by the PN533 not sending its ACK as usual but directly its 
response.
Cause is currently unknown (XRAM corruption? as for the USB descriptors bug) 
and happens after heavy exchanges (as by the first desfire test). Once it 
happens, it seems to happen everytime on the first command received after a usb 
connect.
Quick hack to test a workaround:

@@ -281,6 +353,9 @@ pn53x_usb_transceive (nfc_device_t * pnd, const byte_t * 
pbtTx, const size_t szT
   PRINT_HEX ("RX", abtRx, ret);
 #endif

+// When PN533 fails it doesn't send ACK frames anymore...
+// HACK
+if (ret == 6) {
   if (!pn53x_transceive_check_ack_frame_callback (pnd, abtRx, ret))
     return false;

@@ -293,6 +368,9 @@ pn53x_usb_transceive (nfc_device_t * pnd, const byte_t * 
pbtTx, const size_t szT
 #ifdef DEBUG
   PRINT_HEX ("RX", abtRx, ret);
 #endif
+} else {
+  printf("WARNING missing ACK!\n");
+}

 #ifdef DEBUG
   PRINT_HEX ("TX", ack_frame, 6);

Here is an example of USB frames showing the issue on the first command 
received (doesn't matter which command):

connect:
ffff8800631c63c0 631447718 S Ci:6:079:0 s 80 06 0300 0000 00ff 255 <
ffff8800631c63c0 631449788 C Ci:6:079:0 0 4 = 04030904
ffff8800454a8600 631449827 S Ci:6:079:0 s 80 06 0301 0409 00ff 255 <
ffff8800454a8600 631452792 C Ci:6:079:0 0 20 = 14035300 43004d00 20004d00 
69006300 72006f00
ffff8800454a8600 631452825 S Ci:6:079:0 s 80 06 0300 0000 00ff 255 <
ffff8800454a8600 631454784 C Ci:6:079:0 0 4 = 04030904
ffff8800454a8600 631454804 S Ci:6:079:0 s 80 06 0302 0409 00ff 255 <
ffff8800454a8600 631457785 C Ci:6:079:0 0 30 = 1e035300 43004c00 33003700 
31003100 2d004e00 46004300 26005200 5700
ffff8800454a8000 631457923 S Co:6:079:0 s 00 09 0001 0000 0000 0
ffff8800454a8000 631458764 C Co:6:079:0 0 0

Host sends ACK (just to be sure)
ffff8800631c6f00 631458865 S Bo:6:079:4 -115 6 = 0000ff00 ff00
ffff8800631c6f00 631459771 C Bo:6:079:4 0 6 >

Host sends a first command
ffff8800454a8000 631459810 S Bo:6:079:4 -115 17 = 0000ff0a f6d40600 19001a00 
1b001cbc 00
ffff8800454a8000 631460768 C Bo:6:079:4 0 17 >

PN533 sends directly its answer, no more ACK
ffff8800454a8000 631460799 S Bi:6:079:4 -115 256 <
ffff8800454a8000 631462771 C Bi:6:079:4 0 14 = 0000ff07 f9d50700 09022000 f900

Host acks
ffff8800454a8000 631463824 S Bo:6:079:4 -115 6 = 0000ff00 ff00
ffff8800454a8000 631464770 C Bo:6:079:4 0 6 >

Host sends a second command
ffff8800454a8000 631464813 S Bo:6:079:4 -115 17 = 0000ff0a f6d40600 1d001e00 
1f0020ac 00
ffff8800454a8000 631465768 C Bo:6:079:4 0 17 >

PN533 acks & replies
ffff8800454a8000 631465810 S Bi:6:079:4 -115 256 <
ffff8800454a8000 631466786 C Bi:6:079:4 0 6 = 0000ff00 ff00
ffff8800454a8000 631466831 S Bi:6:079:4 -115 256 <
ffff8800454a8000 631467768 C Bi:6:079:4 0 14 = 0000ff07 f9d50700 01010080 a200

Original comment by yob...@gmail.com on 28 Sep 2010 at 12:25

GoogleCodeExporter commented 9 years ago
The hack works as expected, we now have to implement it cleanly. Thanks

Original comment by romu...@libnfc.org on 28 Sep 2010 at 1:23

GoogleCodeExporter commented 9 years ago
The hack is probably not the right way to do.
Actually it's an issue between the linux kernel and the PN533 at USB level.
After having sent a set_configure to the device, the kernel expects the toggle 
bits to be reset but the device doesn't reset its bit so the first packet from 
PN533 after a set_configure is ignored by the kernel.

cf also 
http://libusb.6.n5.nabble.com/Why-does-Linux-forget-the-USB-toggle-bit-of-my-dev
ice-td6829.html

From the discussion one can see it's not easy to tell who to blame and this 
could be fixed in a kernel revision to come but anyway, as Ludovic said for his 
own device, we've to cope with that fact in libnfc.

Possible ways:
* first thing when we connect to PN533, we send a dummy GetFirmware where we 
known we could miss the ACK. Then all further packets will be ok. Advantage 
compared to the current hack is that the hack would be limited to the 
initialization of the device rather than in all the transceive calls.
* find a way via libusb to resync kernel/PN533 toggle bits, worst case is a 
reset of usb port but it's probably taking too long.
* what else? hope that a clear_halt will resync the device?

Original comment by yob...@gmail.com on 28 Sep 2010 at 3:00

GoogleCodeExporter commented 9 years ago
Oh, of course I know this hack doesn't fix the problem, but it only allow us to 
recove from.

Original comment by romu...@libnfc.org on 28 Sep 2010 at 3:18

GoogleCodeExporter commented 9 years ago
Hi, here is a proposal of patch to replace both the abort/ACK hack & the hack 
above about the toggle bit issue.
Currently both hacks would go into pn53x_usb_transceive and be evaluated every 
time, with an awful hack based on the fact that the first call to 
pn53x_usb_transceive would be a GetFirmware command.
I moved them into pn53x_usb_connect() as they're needed only once at that 
precise moment.
See attachment.
Note that it's very possible that other drivers dealing directly with ACK 
frames would benefit from the abort/ACK too in their connect() (arygon & 
pn532_uart??)

This doesn't change anything regarding issue 115: it would be nice to have a 
way to send abort/ACK messages from the API (but 115 is then just an 
enhancement, not a defect as it's written now)

Original comment by yob...@gmail.com on 28 Sep 2010 at 10:32

Attachments:

GoogleCodeExporter commented 9 years ago
Nice!

I am just wondering if we can make this conditional.  Basicaly, it's a 
workaround for a Linux bug so a conditional like this would be cool:

#if defined(__linux__) /* && __linux_version < XXX complete here once fixed 
upstream */
...
#endif

Note that I am not a Linux user and am not aware of any __linux_version but I 
hardly suppose such a thing exist. The idea is that this version is bumped on 
each "big change" and hopefully at some point we would have a __linux_version = 
206470123 which fix the issue.

Original comment by romain.t...@gmail.com on 28 Sep 2010 at 11:04

GoogleCodeExporter commented 9 years ago
Well it's not a Linux bug, it's actually a PN533 bug (& probably PN531 too) 
triggered by the way the Linux kernel handles it.
After a set_configuration, the device is supposed to reset its toggle bit 
status but it doesn't.
When looking for this bug which is common to other USB devices, it seems that 
e.g. it's also the case with Mac OS X and tomorrow if USB stack of Windows or 
FreeBSD changes it could happen there too.
BTW one way to trigger the bug quickly rather than running the full libfreefare 
testsuite is to force the PN533 to send an odd number of packets (it's usually 
even: ack + data). E.g. by expecting between 65 and 127 bytes of answer, as 
data packet will be sent in 2 steps: 64 bytes then the rest, so with ACK, 3 
packets and the toggle bit is now inversed => next time there is a 
set_configuration the device will send its bulk data with a wrong toggle bit. 
That what happened in libfreefare testsuite: one of the answers was 69-byte long

Original comment by yob...@gmail.com on 29 Sep 2010 at 5:56

GoogleCodeExporter commented 9 years ago
So actually if there is a conditional test to put it would rather be on the 
PN533 firmware version... but you'll known the version only once you've managed 
to talk to it & send GetFirmware(), kind of chicken&egg problem...

Original comment by yob...@gmail.com on 29 Sep 2010 at 6:00

GoogleCodeExporter commented 9 years ago
This issue was updated by revision r641.

This workaround allow to use PN533 USB (like SCL3711) without toogle bit issue 
(on OSes that care about this toogle bit: e.g. GNU/Linux, MacOS).
libfreefare tests suite now works as expected, enjoy!

Original comment by romu...@libnfc.org on 29 Sep 2010 at 9:58