psi46 / pxar

Life is too short for perfection
16 stars 46 forks source link

DTB Firmware DAQ delivers corrupt data #64

Closed simonspa closed 10 years ago

simonspa commented 10 years ago

When running a Pixel Alive test using the API function getEfficiencyMap from the pxar GUI currently we get a critical warning about missing events not present in the DTB buffer:

[20:40:02.680]     INFO: PixTestAlive::doTest() ntrig = 16
[20:40:05.172] CRITICAL: <hal.cc/SingleRocAllPixelsCalibrate:L659> Incomplete DAQ data readout! Missing 85 Events.
[20:40:05.175] CRITICAL: <api.cc/condenseTriggers:L1336> Data size does not correspond to 16 triggers! Aborting data processing!
# enabled rocs = 1 result vector length = 0
 rocIds[0] = 0
[20:40:05.257]     INFO: PixTestAlive::doTest() done

However, when running exactly the same test from testpxar (even 10 times one after another) I don't manage to get that error even once:

[20:49:16.755] DEBUGHAL: <hal.cc/daqStart:L1107> Enabling Deserializer160 for data acquisition. Phase: 4
[20:49:17.158] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L653> Loop finished, 124800 words in buffer, loop took 402ms.
[20:49:18.390] DEBUGHAL: <hal.cc/daqAllEvents:L1186> Finished readout.
[20:49:18.390] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L656> Readout size: 41600 Events, loop+readout took 1634ms.
[20:49:18.390] DEBUGHAL: <hal.cc/daqStop:L1295> Stopped DAQ session.
[20:49:18.390] DEBUGHAL: <hal.cc/daqClear:L1319> Closing DAQ session, deleting data buffers.

This needs to be understood and solved.

simonspa commented 10 years ago

Some observations:

when increasing the delay between triggers in the NIOS loop function (e.g. from 5us to 50us) the two calls behave differently. In case of testpxar it is just the loop time which increases:

[11:20:58.076] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L647> Called SingleRocAllPixelsCalibrate with flags 0, running 10 triggers on I2C 0.
[11:20:58.076] DEBUGHAL: <hal.cc/daqStart:L1066> Starting new DAQ session.
[11:20:58.077] DEBUGHAL: <hal.cc/daqStart:L1072> Allocated buffer size, Channel 0: 50000000
[11:20:58.077] DEBUGHAL: <hal.cc/daqStart:L1109> Enabling Deserializer160 for data acquisition. Phase: 4
[11:21:00.161] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L655> Loop finished, 124800 words in buffer, loop took 2083ms.

the number of words is still exactly the same while for the pxar GUI case the buffer just keeps filling up it seems - increasing the loop time just increases the buffer size. Remember, this is the same API test call, loop time on the FPGA is also the same...

[11:17:03.269] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L647> Called SingleRocAllPixelsCalibrate with flags 0, running 10 triggers on I2C 0.
[11:17:03.269] DEBUGHAL: <hal.cc/daqStart:L1066> Starting new DAQ session.
[11:17:03.269] DEBUGHAL: <hal.cc/daqStart:L1072> Allocated buffer size, Channel 0: 50000000
[11:17:03.270] DEBUGHAL: <hal.cc/daqStart:L1109> Enabling Deserializer160 for data acquisition. Phase: 4
[11:17:05.353] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L655> Loop finished, 469038 words in buffer, loop took 2083ms.

It does not seem to be related to the constant polling of IA and ID - even with that removed the readout is not fine after all.

simonspa commented 10 years ago

Looking at the raw data and the "decoded" version of it returned from the test running from the GUI reveals a lot of garbage readout, so probably this is merely a settings problem we need to catch.

This is one event:

[11:39:44.575] DEBUGPIPES: -------------------------
[11:39:44.575] DEBUGPIPES: 07f8 0040 0001 0100 000b 0200 000d 0240 0009 0340 0007 0400 0002 0540 0005 06c0 0001 0840 0001 
[11:39:44.575] DEBUGPIPES: ====== 7f8 ====== ROC 0 [2,80,1] ROC 0 [8,80,11] ROC 0 [12,80,13] ROC 0 [14,80,9] ROC 0 [22,80,7] ROC 0 [24,80,2] ROC 0 [34,80,5] ROC 0 [42,80,1] ROC 0 [50,80,1] 

And this is one event when read out using testpxar:

[11:42:56.466] DEBUGPIPES: 07f8 0284 06c3 
[11:42:56.466] DEBUGPIPES: ====== 7f8 ====== ROC 0 [17,67,99] 
simonspa commented 10 years ago

This gets more and more mysterious. From testpxar I can swamp the DTB with trigger requests, and all readout works perfectly fine, e.g. a PixelAlive map with 200 triggers per pixel:

[16:31:45.510] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L649> Called SingleRocAllPixelsCalibrate with flags 0, running 200 triggers on I2C 0.
[16:31:45.510] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L650> Expecting 832000 events.
[16:31:45.510] DEBUGHAL: <hal.cc/daqStart:L1072> Starting new DAQ session.
[16:31:45.511] DEBUGHAL: <hal.cc/daqStart:L1078> Allocated buffer size, Channel 0: 50000000
[16:31:45.512] DEBUGHAL: <hal.cc/daqStart:L1115> Enabling Deserializer160 for data acquisition. Phase: 4
[16:43:43.281] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L658> Loop finished, 2496008 words in buffer, loop took 4010ms.
[16:43:48.190] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L661> Readout size: 832000 Events, loop+readout took 8919ms.

Or, if you need another example: Doing a CalDel scan for all 4160 pixels in one go:

[16:32:31.691] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacScan:L785> Called SingleRocAllPixelsDacScan with flags 0, running 15 triggers.
[16:32:31.691] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacScan:L786> Scanning DAC 26 from 50 to 100
[16:32:31.691] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacScan:L789> Expecting 3182400 events.
[16:32:31.691] DEBUGHAL: <hal.cc/daqStart:L1072> Starting new DAQ session.
[16:32:31.691] DEBUGHAL: <hal.cc/daqStart:L1078> Allocated buffer size, Channel 0: 50000000
[16:32:31.691] DEBUGHAL: <hal.cc/daqStart:L1115> Enabling Deserializer160 for data acquisition. Phase: 4
[16:33:06.137] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacScan:L797> Loop finished, 4730528 words in buffer, loop took 34446ms.
[16:33:15.401] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacScan:L800> Readout size: 3182400 Events, loop+readout took 43710ms.

...and also all results look fine:

CalDel scan for ROC 0, pixel 11,20

15 |                                 oooooooooooooooooo
   |                                 ..................
   |                                o..................
   |                                ...................
   |                                ...................
10 |                                ...................
   |                                ...................
   |                                ...................
   |                                ...................
   |                                ...................
 5 |                               o...................
   |                               ....................
   |                               ....................
   |                               ....................
   |                               ....................
 0 |ooooooooooooooooooooooooooooooo....................
   |___________________________________________________
    '         '         '         '         '         '
    50        60        70        80        90        100

However, the GUI doesn't manage to run a simple PixelAlive map with 10 triggers, the RAM on the DTB is immediately filled with garbled data which contains invalid pixels, lots of zeros and, in the end, too few triggers.

simonspa commented 10 years ago

And just for the sake of documentation: with testpxar we can also do this in yet another dimension, doing a Full-ROC DacDac (CalDel vs ThrComp) Scan in less than 70sec:

[16:49:17.775] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacDacScan:L941> Called SingleRocAllPixelsDacDacScan with flags 0, running 5 triggers.
[16:49:17.775] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacDacScan:L943> Scanning DAC 26 from 130 to 145 vs. DAC 12 from 10 to 30
[16:49:17.775] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacDacScan:L949> Expecting 6988800 events.
[16:49:17.775] DEBUGHAL: <hal.cc/daqStart:L1075> Starting new DAQ session.
[16:49:17.776] DEBUGHAL: <hal.cc/daqStart:L1081> Allocated buffer size, Channel 0: 50000000
[16:49:17.776] DEBUGHAL: <hal.cc/daqStart:L1118> Enabling Deserializer160 for data acquisition. Phase: 4
[16:50:00.758] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacDacScan:L957> Loop finished, 13544444 words in buffer, loop took 42982ms.
[16:50:26.423] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacDacScan:L960> Readout size: 6988800 Events, loop+readout took 68647ms.
ursl commented 10 years ago

Hi Simon,

Looking at the raw data and the "decoded" version of it returned from the test running from the GUI reveals a lot of garbage readout, so probably this is merely a settings problem we need to catch.

did you run with identical settings?

Cheers, --U.

simonspa commented 10 years ago

yes I did - and that's the puzzling thing. otherwise the source of this would be clear and easy to fix with some more checks on the settings' validity.

are you able to reproduce the problem as described here?

ursl commented 10 years ago

I am puzzled at the moment as well.

I continued a bit with the "setup" test (using the daq length as indicator), and initially ran into problems immediately. These problems all went away after setting the chips into a well-defined state (in particular masking all un-tested pixels). Unfortunately, that is not really true for the PixelAlive, where that is ensured but the r/o screws up for ntrig large enough.

Cheers, --U.

On Tue, Feb 18, 2014 at 9:52 AM, simonspa notifications@github.com wrote:

yes I did - and that's the puzzling thing. otherwise the source of this would be clear and easy to fix with some more checks on the settings' validity.

are you able to reproduce the problem as described here?

Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35363753 .

hperrey commented 10 years ago

Just to add to this issue: I can reproduce this from testpxar by reducing the test API calls to:

[...]
 _api->HVon(); // not touching anything above

 _api->_dut->testAllPixels(true);
 _api->_dut->maskAllPixels(false);

 std::vector<pxar::pixel> results; 
 results = _api->getEfficiencyMap(0, 100);
 std::cout << " Number of pixels returned : " << results.size() << std::endl;
 asciimap(results, 100, 0);

 _api->HVoff(); // not touching anything below
[...]

This currently fails more often than not with my single-chip module (psi46dig) with errors similar to:

[.....]
[16:36:44.496] DEBUGRPC: "Daq_Read" called.
[16:36:44.540] DEBUGRPC: "Daq_Read" called.
[16:36:44.623] DEBUGRPC: "Daq_Read" called.
[16:36:44.629] DEBUGRPC: "Daq_Read" called.
[16:36:44.630] DEBUGHAL: <hal.cc/daqAllEvents:L1199> Finished readout.
[16:36:44.630] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L663> Readout size: 415999 Events, loop+readout took 4529ms.
[16:36:44.630] CRITICAL: <hal.cc/SingleRocAllPixelsCalibrate:L666> Incomplete DAQ data readout! Missing 1 Events.
[16:36:44.630] DEBUGHAL: <hal.cc/daqStop:L1308> Stopped DAQ session.
[16:36:44.630] DEBUGRPC: "Pg_Stop" called.
[16:36:44.630] DEBUGRPC: "GetRpcCallId" called.
[16:36:44.631] DEBUGRPC: "Daq_Stop" called.
[16:36:44.631] DEBUGRPC: "GetRpcCallId" called.
[16:36:44.631] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "uDelay" called.
[16:36:44.632] DEBUGHAL: <hal.cc/daqClear:L1332> Closing DAQ session, deleting data buffers.
[16:36:44.632] DEBUGRPC: "Daq_Close" called.
[16:36:44.632] DEBUGRPC: "GetRpcCallId" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.640] DEBUGAPI: <api.cc/MaskAndTrim:L1646> ROC 0 features 0 masked pixels.
[16:36:44.640] DEBUGAPI: <api.cc/MaskAndTrim:L1655> Masking ROC 0 in one go.
[16:36:44.640] DEBUGRPC: "roc_I2cAddr" called.
[16:36:44.640] DEBUGHAL: <hal.cc/RocSetMask:L506> Masking ROC 0
[16:36:44.640] DEBUGRPC: "roc_Chip_Mask" called.
[16:36:44.640] DEBUGAPI: <api.cc/expandLoop:L1326> Test took 4566ms.
[16:36:44.642] DEBUGAPI: <api.cc/repackMapData:L1384> Simple Map Repack of 415999 data blocks, returning efficiency.
[16:36:44.643] CRITICAL: <api.cc/condenseTriggers:L1337> Data size does not correspond to 100 triggers! Aborting data processing!
[16:36:44.643] DEBUGAPI: <api.cc/repackMapData:L1403> Correctly repacked Map data for delivery.
[16:36:44.643] DEBUGAPI: <api.cc/repackMapData:L1404> Repacking took 1ms.
 Number of pixels returned : 0

Another observation: with commit 79c78cbb59e0f987248f1f2ee9fde77a56f9998a I added debug printout if the pixel decoding fails; this happens with certain patterns:

[16:48:14.679] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe29: ROC 0 [0,80,25]
[16:48:14.679] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe40: ROC 0 [0,80,32]
[16:48:14.679] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe28: ROC 0 [0,80,24]
[16:48:14.679] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe2d: ROC 0 [0,80,29]
[16:48:14.680] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 4bfe47: ROC 0 [28,80,39]
[16:48:14.680] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe40: ROC 0 [0,80,32]
[16:48:14.680] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 83fe2f: ROC 0 [48,80,31]
[16:48:14.680] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe29: ROC 0 [0,80,25]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe41: ROC 0 [0,80,33]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe40: ROC 0 [0,80,32]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe2b: ROC 0 [0,80,27]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.682] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.682] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.682] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.682] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]

i.e. the row is typically "80".

Is anybody else seeing something similar?

hperrey commented 10 years ago

Some more observations:

hperrey commented 10 years ago

Yet more observations:

[.....]
[17:11:01.248] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46940: ROC 0 [63,206,160]
[17:11:01.248] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 54691b: ROC 0 [35,206,139]
[17:11:01.248] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46924: ROC 0 [63,206,148]
[17:11:01.248] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 44691b: ROC 0 [27,206,139]
[17:11:01.248] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a4691c: ROC 0 [63,206,140]
[17:11:01.249] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46829: ROC 0 [63,206,25]
[17:11:01.249] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 44691f: ROC 0 [27,206,143]
[17:11:01.249] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 64691f: ROC 0 [39,206,143]
[17:11:01.249] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a4691f: ROC 0 [63,206,143]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46928: ROC 0 [63,206,152]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46941: ROC 0 [63,206,161]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46cc4: ROC 0 [63,207,100]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a474cc: ROC 0 [63,208,108]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 506917: ROC 0 [33,206,135]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 9235ae: ROC 0 [57,12,222]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46902: ROC 0 [63,206,130]
[17:11:01.331] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 54691b: ROC 0 [35,206,139]
[17:11:01.331] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46824: ROC 0 [63,206,20]
[17:11:01.331] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 246944: ROC 0 [15,206,164]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46989: ROC 0 [63,206,201]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 54691b: ROC 0 [35,206,139]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 24691f: ROC 0 [15,206,143]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46940: ROC 0 [63,206,160]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46925: ROC 0 [63,206,149]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a464c6: ROC 0 [63,205,102]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a4691e: ROC 0 [63,206,142]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 246941: ROC 0 [15,206,161]
[17:11:01.333] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 9235ac: ROC 0 [57,12,220]
[...many more....]
ursl commented 10 years ago

Are the following two observations from a DacDacScan aspects of the same problem?

1) there is often a "noise band" (also visible in Philipp's presentation, but I don't know what fw/nios they used)

2) often there is single-DAC band at the low edge as well

On Tue, Feb 18, 2014 at 5:02 PM, Hanno Perrey notifications@github.comwrote:

Some more observations:

  • the invalid pixels are usually (always?) found in the first Daq_Read() calls
  • even when the readout size is correct, there might be invalid pixels in there; in the resulting ASCII map, only the first few lines are affected

Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35399086 .

philippeller commented 10 years ago

Hi Urs et al.

1) I found out today that the bands disappear when the other pixels are all masked.

2) In pyXar we cannot see this. First line is 'in sync' with all others

cheers, Philipp On 18.Feb 2014, at 5:39 PM, ursl wrote:

Are the following two observations from a DacDacScan aspects of the same problem?

1) there is often a "noise band" (also visible in Philipp's presentation, but I don't know what fw/nios they used)

2) often there is single-DAC band at the low edge as well

On Tue, Feb 18, 2014 at 5:02 PM, Hanno Perrey notifications@github.comwrote:

Some more observations:

  • the invalid pixels are usually (always?) found in the first Daq_Read() calls
  • even when the readout size is correct, there might be invalid pixels in there; in the resulting ASCII map, only the first few lines are affected

Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35399086 .

— Reply to this email directly or view it on GitHub.

roberts42 commented 10 years ago

Hi everyone,

My latest observations using the FPIX module are slightly different now. First, the data pipes have started working on OS X. I don’t know why. But I now get invalid pixels during the parsing.

It does appear to be wrong in the data from the Daq_Read:

12:17:52.507] DEBUGPIPES: 0080 0091 00a0 00b1 001b 002f 003c 0043 005f 006c 0013 002f 003c 0043 005f 006c 00cf 00df 00ef 00ff [12:17:52.507] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of bfc3fc: ROC 0 [75,198,252] [12:17:52.507] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fc3fc: ROC 0 [27,198,252] [12:17:52.507] DEBUGPIPES: ====== 101 ====== ROC 7 [75,198,252] ROC 7 [27,198,252] [

There are several things wrong with this event.

  1. I see no ROC headers. The pixel data comes directly after the TBM HDR/info.
  2. Since this was the PixelAlive there should be 1 pixel per ROC or 8 pixels per event. I always see two pixels, which are then invalid.
  3. This should be ROC 8. But there was no initial ROC header.

When I use psi46test to read out the module the data looks ok, with all appropriate headers:

TBM H1(6) H2(4) H3(0) H4(1) EV(100) STF(0) PKR(0) STKCNT( 0)

  1. ROC-HD(9):
  2. ROC-HD(9): R1(3) R2(1) R3(4) R4(9) R5(2) R6(A) Raw: 31492A Pixel [14244] 20/30: 154 R1(3) R2(1) R3(4) R4(5) R5(2) R6(A) Raw: 31452A Pixel [14242] 20/31: 154 R1(3) R2(1) R3(4) R4(1) R5(4) R6(0) Raw: 314140 Pixel [14240] 20/32: 160 R1(3) R2(1) R3(3) R4(9) R5(4) R6(2) Raw: 313942 Pixel [14234] 20/33: 162 R1(3) R2(1) R3(3) R4(5) R5(4) R6(2) Raw: 313542 Pixel [14232] 20/34: 162 R1(3) R2(1) R3(3) R4(1) R5(4) R6(8) Raw: 313148 Pixel [14230] 20/35: 168 R1(3) R2(1) R3(2) R4(9) R5(4) R6(6) Raw: 312946 Pixel [14224] 20/36: 166 R1(3) R2(1) R3(2) R4(5) R5(4) R6(2) Raw: 312542 Pixel [14222] 20/37: 162 R1(3) R2(1) R3(2) R4(1) R5(2) R6(E) Raw: 31212E Pixel [14220] 20/38: 158 R1(3) R2(1) R3(1) R4(9) R5(4) R6(0) Raw: 311940 Pixel [14214] 20/39: 160 R1(3) R2(1) R3(1) R4(B) R5(4) R6(1) Raw: 311B41 Pixel [14215] 21/39: 161 R1(3) R2(1) R3(2) R4(3) R5(4) R6(0) Raw: 312340 Pixel [14221] 21/38: 160 R1(3) R2(1) R3(2) R4(7) R5(2) R6(E) Raw: 31272E Pixel [14223] 21/37: 158 R1(3) R2(1) R3(2) R4(B) R5(2) R6(E) Raw: 312B2E Pixel [14225] 21/36: 158 R1(3) R2(1) R3(3) R4(3) R5(4) R6(2) Raw: 313342 Pixel [14231] 21/35: 162 R1(3) R2(1) R3(3) R4(7) R5(4) R6(A) Raw: 31374A Pixel [14233] 21/34: 170 R1(3) R2(1) R3(3) R4(B) R5(4) R6(E) Raw: 313B4E Pixel [14235] 21/33: 174 R1(3) R2(1) R3(4) R4(3) R5(4) R6(0) Raw: 314340 Pixel [14241] 21/32: 160 R1(3) R2(1) R3(4) R4(7) R5(4) R6(3) Raw: 314743 Pixel [14243] 21/31: 163 R1(3) R2(1) R3(4) R4(B) R5(2) R6(A) Raw: 314B2A Pixel [14245] 21/30: 154 R1(3) R2(5) R3(4) R4(9) R5(4) R6(A) Raw: 35494A Pixel [15244] 22/30: 170 R1(3) R2(5) R3(4) R4(5) R5(4) R6(2) Raw: 354542 Pixel [15242] 22/31: 162 R1(3) R2(5) R3(4) R4(1) R5(4) R6(2) Raw: 354142 Pixel [15240] 22/32: 162 R1(3) R2(5) R3(3) R4(9) R5(4) R6(F) Raw: 35394F Pixel [15234] 22/33: 175 R1(3) R2(5) R3(3) R4(5) R5(4) R6(6) Raw: 353546 Pixel [15232] 22/34: 166 R1(3) R2(5) R3(3) R4(1) R5(4) R6(3) Raw: 353143 Pixel [15230] 22/35: 163 R1(3) R2(5) R3(2) R4(9) R5(4) R6(8) Raw: 352948 Pixel [15224] 22/36: 168 R1(3) R2(5) R3(2) R4(5) R5(4) R6(4) Raw: 352544 Pixel [15222] 22/37: 164 R1(3) R2(5) R3(2) R4(1) R5(4) R6(A) Raw: 35214A Pixel [15220] 22/38: 170 R1(3) R2(5) R3(1) R4(9) R5(2) R6(E) Raw: 35192E Pixel [15214] 22/39: 158 R1(3) R2(5) R3(1) R4(B) R5(4) R6(C) Raw: 351B4C Pixel [15215] 23/39: 172 R1(3) R2(5) R3(2) R4(3) R5(4) R6(6) Raw: 352346 Pixel [15221] 23/38: 166 R1(3) R2(5) R3(2) R4(7) R5(4) R6(A) Raw: 35274A Pixel [15223] 23/37: 170 R1(3) R2(5) R3(2) R4(B) R5(4) R6(2) Raw: 352B42 Pixel [15225] 23/36: 162 R1(3) R2(5) R3(3) R4(3) R5(4) R6(A) Raw: 35334A Pixel [15231] 23/35: 170 R1(3) R2(5) R3(3) R4(7) R5(4) R6(A) Raw: 35374A Pixel [15233] 23/34: 170 R1(3) R2(5) R3(3) R4(B) R5(4) R6(6) Raw: 353B46 Pixel [15235] 23/33: 166 R1(3) R2(5) R3(4) R4(3) R5(4) R6(6) Raw: 354346 Pixel [15241] 23/32: 166 R1(3) R2(5) R3(4) R4(7) R5(4) R6(8) Raw: 354748 Pixel [15243] 23/31: 168 R1(3) R2(5) R3(4) R4(B) R5(4) R6(6) Raw: 354B46 Pixel [15245] 23/30: 166 R1(4) R2(1) R3(4) R4(9) R5(4) R6(2) Raw: 414942 Pixel [20244] 24/30: 162 R1(4) R2(1) R3(4) R4(5) R5(4) R6(2) Raw: 414542 Pixel [20242] 24/31: 162 R1(4) R2(1) R3(4) R4(1) R5(4) R6(0) Raw: 414140 Pixel [20240] 24/32: 160 R1(4) R2(1) R3(3) R4(9) R5(6) R6(0) Raw: 413960 Pixel [20234] 24/33: 176 R1(4) R2(1) R3(3) R4(5) R5(4) R6(8) Raw: 413548 Pixel [20232] 24/34: 168 R1(4) R2(1) R3(3) R4(1) R5(4) R6(6) Raw: 413146 Pixel [20230] 24/35: 166 R1(4) R2(1) R3(2) R4(9) R5(4) R6(0) Raw: 412940 Pixel [20224] 24/36: 160 R1(4) R2(1) R3(2) R4(5) R5(4) R6(6) Raw: 412546 Pixel [20222] 24/37: 166 R1(4) R2(1) R3(2) R4(1) R5(4) R6(0) Raw: 412140 Pixel [20220] 24/38: 160 R1(4) R2(1) R3(1) R4(9) R5(2) R6(F) Raw: 41192F Pixel [20214] 24/39: 159
  3. ROC-HD(9):
  4. ROC-HD(9):
  5. ROC-HD(9):
  6. ROC-HD(9):
  7. ROC-HD(9):
  8. ROC-HD(9): TBM T1(2) T2(2) T3(C) T4(0) NTP(0) RST(0) RSR(1) SYE(0) SYT(0) CTC(0) CAL(1) SF(0) D3( 0)

Robert

On Feb 18, 2014, at 10:43 AM, Philipp Eller notifications@github.com wrote:

Hi Urs et al.

On 18.Feb 2014, at 5:39 PM, ursl wrote:

Are the following two observations from a DacDacScan aspects of the same problem?

1) there is often a "noise band" (also visible in Philipp's presentation, but I don't know what fw/nios they used) I found out today that the bands disappear when the other pixels are all masked.

2) often there is single-DAC band at the low edge as well In pyXar we cannot see this. First line is 'in sync' with all others

cheers, Philipp

On Tue, Feb 18, 2014 at 5:02 PM, Hanno Perrey notifications@github.comwrote:

Some more observations:

  • the invalid pixels are usually (always?) found in the first Daq_Read() calls
  • even when the readout size is correct, there might be invalid pixels in there; in the resulting ASCII map, only the first few lines are affected

Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35399086 .

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub.

simonspa commented 10 years ago

Okay, I reworked parts of the NIOS code to follow Urs' suggestion of really masking the whole chip and just trimming the pixel in question (other than the approach we took until now: trim all necessary pixels and only send Calibrate to the ones to be measured).

At first sight this looked promising and working, but now that I reworked part of the NIOS code to allow storage of trim bits in the DTB RAM I encounter the same problems as before, but with all different software options.

It is slightly different though, it looks like only the first 10-20 events are garbled and merged, and the rest of the readout looks more or less fine (as far as I can see by staring at the raw readout).

Conclusion: this does not seem to be the Holy Grail yet

ursl commented 10 years ago

I was sobered today as well. My observation of yesterday with a 'better' setup ROC did not work for threshold measurements. They still worked well for the efficiencyMap determination.

Cheers, --U.

Am Freitag, 21. Februar 2014 schrieb simonspa :

Okay, I reworked parts of the NIOS code to follow Urs' suggestion of really masking the whole chip and just trimming the pixel in question (other than the approach we took until now: trim all necessary pixels and only send Calibrate to the ones to be measured).

At first sight this looked promising and working, but now that I reworked part of the NIOS code to allow storage of trim bits in the DTB RAM I encounter the same problems as before, but with all different software options.

It is slightly different though, it looks like only the first 10-20 events are garbled and merged, and the rest of the readout looks more or less fine (as far as I can see by staring at the raw readout).

Conclusion: this does not seem to be the Holy Grail yet

Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35749083 .

simonspa commented 10 years ago

Some more updates and investigations on this.

Some remarks first:

And now the results:

EPCS: ok
#regions: 1
region 0
  block size:       65536
  number of blocks: 32
  offset:           0
  size:             2097152
Welcome!
daq_mem_base address: 4b800
Daq_Read first bytes: c 0
Daq_Read first bytes: 20b 46c7
Daq_Read first bytes: 4aab 87f8
Daq_Read first bytes: 87f8 681
daq_mem_base address: 4b800
Daq_Read first bytes: 87f8 22
Daq_Read first bytes: 20b 46c5
Daq_Read first bytes: 4aab 87f8
Daq_Read first bytes: 87f8 681
[16:34:26.366] DEBUGPIPES: 000c 0000 000c 0000 ffff ffff 6144 5f71 704f 6e65 4924 4349 2400 0076 07e9 0000 4376 0000 07e1 0000 0024 0000 0024 0000 ffff ffff 6f42 7261 2064 6469 203a 2020 44ad 87f8 0022 44ab 87f8 00
Before:     0x4b800
After:        0x4ad90

Not sure how to proceed.

ursl commented 10 years ago

I see a strong dependence of the error rate for PixelAlive (i.e. calling api::getEfficiencyMap(...) on the DAC parameters VthrComp and CalDel. Specifically: Moving from the current pretest "recommendation" into a more stable regime (shift CalDel to higher values and reduce VthrComp) improves the situation for PixelAlive very much. The DAC parameter settings do not help (in my case) for threshold scans. (I have no access to the disable-all-pixels-except-the-one-tested feature).

Is this a feature of my ROC/setup or do you see the same behavior?

On Mon, Feb 24, 2014 at 4:51 PM, simonspa notifications@github.com wrote:

Some more updates and investigations on this.

Some remarks first:

  • All the below information refers to the DESER160 using a single ROC assembly with a PSI46digV2 chip and no sensor.
  • The chip is fully working and efficient, no dead pixels or anything. This has been tested for weeks now.
  • The test patterns I use is a simple Pixel Alive Test using 10 triggers per pixel. This allows me to check which triggers are missing by just looking at the decoded pixels.

And now the results:

  • With the "mask all but one pixel throughout the loop" the problematic patterns somehow changed but did not vanish. What I see are both missing events in the beginning (always) and sometimes somewhere in the middle of a block, too.
  • The patterns in the beginning of the readout (broken events) are always the same.
  • When deleting (!) code from the NIOS softcore and recompiling a new firmware the patterns in the beginning are sometimes shorter, sometimes longer, apparently depending on the compiled NIOS code somehow.
  • I think I ruled out the possibility of it being a problem anywhere on the PC side by adding debug statements to the NIOS code and looking at the data read from the RAM directly on the DTB. This reveals:

EPCS: ok

regions: 1

region 0 block size: 65536 number of blocks: 32 offset: 0 size: 2097152 Welcome! daq_mem_base address: 4b800 Daq_Read first bytes: c 0 Daq_Read first bytes: 20b 46c7 Daq_Read first bytes: 4aab 87f8 Daq_Read first bytes: 87f8 681 daq_mem_base address: 4b800 Daq_Read first bytes: 87f8 22 Daq_Read first bytes: 20b 46c5 Daq_Read first bytes: 4aab 87f8 Daq_Read first bytes: 87f8 681

  • where daq_mem_base is the pointer address of the DAQ RAM block. It's printed in the Daq_Open() function, which means I'm running the pixel alive test twice here. The other printout is issued every time I call the Daq_Read() function after the trigger loop finished. As one can see, the first readout of the first test has clearly garbled data in it, just as I see it on the PC side, showing the c 0 as the first two 16-bit words:

[16:34:26.366] DEBUGPIPES: 000c 0000 000c 0000 ffff ffff 6144 5f71 704f 6e65 4924 4349 2400 0076 07e9 0000 4376 0000 07e1 0000 0024 0000 0024 0000 ffff ffff 6f42 7261 2064 6469 203a 2020 44ad 87f8 0022 44ab 87f8 00

  • This means, USB data transfer and all is just fine and we get what was there.
  • The last events read from the RAM always look good, and I always see pixel [51,79] appearing, with its 10 triggers (events). This means it's not the end of the data stream we are chopping of by closing the DAQ channel too fast.
  • It makes no difference whether we first call Daq_Stop() and then read out the data or the other way around (as to be expected after the cross-check above)
  • What is interesting here to note is the fact, that the corrupt readout is less of a problem the second time we run the test - but still is there. Running it a third time or 20 times doesn't change anything here.
  • The allocated DAQ memory always starts at the same position daq_mem_base. This changes after a code change (in this case: deleted the content of some unused test functions) and re-compilation. The starting position is earlier in the RAM (lower register), so probably the NIOS code eats up less memory and more is available for allocation:

Before: 0x4b800 After: 0x4ad90

  • In some cases, mostly when we have little problematic readout in the first run anyways, the further runs have no problem and run through without issues.
  • Timing within the actual trigger loop does not seem to affect this behavior at all. Even waiting uDelay(5000) microsecs after the roc_Pix_Cal command, or some hundert us after Pg_Single() do not change the data, only (as expected) the run time of the trigger loop. (Fun fact: regular single ROC pixel alive w/ 10 triggers takes 550ms in total)

Not sure how to proceed.

Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35899805 .

simonspa commented 10 years ago

Hi Urs,

all changes are now in the two branches https://github.com/psi46/pxar/tree/nios-trim-storage (for the pxar part) and https://github.com/psi46/pixel-dtb-firmware/tree/trim-on-nios (for the firmware part).

Cheers Simon

simonspa commented 10 years ago

Update on this issue after somedays/weeks of testing with the new Firmware 2.x:

simonspa commented 10 years ago

Updates from, using the daqtest datagenerator:

PSI46V2 ROC/Wafer Tester V1.5 (28.03.2014)
DTB DTB_WS8ZAW opened
--- DTB info-------------------------------------
Board id:    85
HW version:  DTB1.2
FW version:  2.1
SW version:  2.10
USB id:      DTB_WS8ZAW
MAC address: 40D855118055
Hostname:    pixelDTB085
Comment:    
-------------------------------------------------

+-- control commands ------------------------------------------+
| h                  display this text                         |
| exit               exit commander                            |
+-- chip test -------------------------------------------------+
| test <chip id>     run chip test                             |
+--------------------------------------------------------------+
> daqtest
   0: created 100000 -> #samples=100000; #errors=0
   1: created 100000 -> #samples=100000; #errors=0
   2: created 100000 -> #samples=100000; #errors=0
   3: created 100000 -> #samples=100000; #errors=0
   4: created 100000 -> #samples=100000; #errors=4 <----
   5: created 100000 -> #samples=100000; #errors=0
   6: created 100000 -> #samples=100000; #errors=0

[...]

  19: created 100000 -> #samples=100000; #errors=0
#block errors 1
> daqtest
   0: created 100000 -> #samples=100000; #errors=0

[...]

 531: created 100000 -> #samples=100000; #errors=0
#block errors 0
PSI46V2 ROC/Wafer Tester V1.5 (28.03.2014)

DTB DTB_WS8ZAW opened
--- DTB info-------------------------------------
Board id:    85
HW version:  DTB1.2
FW version:  2.1
SW version:  2.10
USB id:      DTB_WS8ZAW
MAC address: 40D855118055
Hostname:    pixelDTB085
Comment:    
-------------------------------------------------

+-- control commands ------------------------------------------+
| h                  display this text                         |
| exit               exit commander                            |
+-- chip test -------------------------------------------------+
| test <chip id>     run chip test                             |
+--------------------------------------------------------------+
> daqtest
   0: created 100000 -> #samples=100000; #errors=0

[...]

1083: created 100000 -> #samples=100000; #errors=0
#block errors 0

However, @werdmann reports more and different problems using this mechanism.

xshi commented 10 years ago

Just a reminder that with the firmware v2.14 and latest pxar, I still get the error with Pretest:

[16:39:17.595] INFO: PixTestPretest::setVana() done, Module Ia 382.7 mA = 23.9187 mA/ROC [16:41:58.281] CRITICAL: <USBInterface.libftd2xx.cc/FillBuffer:L258> Requested to read 3599b, but read 3597b - 2b missing! terminate called after throwing an instance of 'pxar::UsbConnectionTimeout' what(): Read from USB timed out. Aborted (core dumped)

roberts42 commented 10 years ago

Hi Xin,

I am able to run the Pretest without problems. Can you check if the USB port that the DTB is connected is USB2.0? Sometimes not all the ports on the machine are 2.0. On my linux system I only have 1 USB2.0 port and the others are 1.0.

Robert

On May 7, 2014, at 3:48 PM, Xin Shi notifications@github.com wrote:

Just a reminder that with the firmware v2.14 and latest pxar, I still get the error with Pretest:

[16:39:17.595] INFO: PixTestPretest::setVana() done, Module Ia 382.7 mA = 23.9187 mA/ROC [16:41:58.281] CRITICAL: Requested to read 3599b, but read 3597b - 2b missing! terminate called after throwing an instance of 'pxar::UsbConnectionTimeout' what(): Read from USB timed out. Aborted (core dumped)

— Reply to this email directly or view it on GitHub.

xshi commented 10 years ago

Hi Robert,

Thanks for your suggestion. I just checked :

Device: ID 0403:6001 Future Technology Devices International, Ltd FT232 USB-Serial (UART) IC

Device Descriptor:

bLength 18

bDescriptorType 1

bcdUSB 1.10

It looks I connected it to 1.10 version. I will try the 2.0 version tomorrow.

Xin

On Wed, May 7, 2014 at 4:52 PM, Robert Stringer notifications@github.comwrote:

Hi Xin,

I am able to run the Pretest without problems. Can you check if the USB port that the DTB is connected is USB2.0? Sometimes not all the ports on the machine are 2.0. On my linux system I only have 1 USB2.0 port and the others are 1.0.

Robert

On May 7, 2014, at 3:48 PM, Xin Shi notifications@github.com wrote:

Just a reminder that with the firmware v2.14 and latest pxar, I still get the error with Pretest:

[16:39:17.595] INFO: PixTestPretest::setVana() done, Module Ia 382.7 mA = 23.9187 mA/ROC [16:41:58.281] CRITICAL: Requested to read 3599b, but read 3597b - 2b missing! terminate called after throwing an instance of 'pxar::UsbConnectionTimeout' what(): Read from USB timed out. Aborted (core dumped)

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-42481627 .

simonspa commented 10 years ago

I guess this issue is more or less obsolete since it more and more looks like this (hopefully last) issue we are currently fighting with concerning data transmission is related to USB.

Please discuss further in #121

xshi commented 10 years ago

A final comment: the USB I plugged in was indeed 2.0. After I reboot, it seems working now. I will post further progress in other thread. Close this one now :)