Closed simonspa closed 10 years ago
Some observations:
when increasing the delay between triggers in the NIOS loop function (e.g. from 5us to 50us) the two calls behave differently. In case of testpxar it is just the loop time which increases:
[11:20:58.076] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L647> Called SingleRocAllPixelsCalibrate with flags 0, running 10 triggers on I2C 0.
[11:20:58.076] DEBUGHAL: <hal.cc/daqStart:L1066> Starting new DAQ session.
[11:20:58.077] DEBUGHAL: <hal.cc/daqStart:L1072> Allocated buffer size, Channel 0: 50000000
[11:20:58.077] DEBUGHAL: <hal.cc/daqStart:L1109> Enabling Deserializer160 for data acquisition. Phase: 4
[11:21:00.161] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L655> Loop finished, 124800 words in buffer, loop took 2083ms.
the number of words is still exactly the same while for the pxar GUI case the buffer just keeps filling up it seems - increasing the loop time just increases the buffer size. Remember, this is the same API test call, loop time on the FPGA is also the same...
[11:17:03.269] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L647> Called SingleRocAllPixelsCalibrate with flags 0, running 10 triggers on I2C 0.
[11:17:03.269] DEBUGHAL: <hal.cc/daqStart:L1066> Starting new DAQ session.
[11:17:03.269] DEBUGHAL: <hal.cc/daqStart:L1072> Allocated buffer size, Channel 0: 50000000
[11:17:03.270] DEBUGHAL: <hal.cc/daqStart:L1109> Enabling Deserializer160 for data acquisition. Phase: 4
[11:17:05.353] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L655> Loop finished, 469038 words in buffer, loop took 2083ms.
It does not seem to be related to the constant polling of IA and ID - even with that removed the readout is not fine after all.
Looking at the raw data and the "decoded" version of it returned from the test running from the GUI reveals a lot of garbage readout, so probably this is merely a settings problem we need to catch.
This is one event:
[11:39:44.575] DEBUGPIPES: -------------------------
[11:39:44.575] DEBUGPIPES: 07f8 0040 0001 0100 000b 0200 000d 0240 0009 0340 0007 0400 0002 0540 0005 06c0 0001 0840 0001
[11:39:44.575] DEBUGPIPES: ====== 7f8 ====== ROC 0 [2,80,1] ROC 0 [8,80,11] ROC 0 [12,80,13] ROC 0 [14,80,9] ROC 0 [22,80,7] ROC 0 [24,80,2] ROC 0 [34,80,5] ROC 0 [42,80,1] ROC 0 [50,80,1]
And this is one event when read out using testpxar:
[11:42:56.466] DEBUGPIPES: 07f8 0284 06c3
[11:42:56.466] DEBUGPIPES: ====== 7f8 ====== ROC 0 [17,67,99]
This gets more and more mysterious. From testpxar
I can swamp the DTB with trigger requests, and all readout works perfectly fine, e.g. a PixelAlive map with 200 triggers per pixel:
[16:31:45.510] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L649> Called SingleRocAllPixelsCalibrate with flags 0, running 200 triggers on I2C 0.
[16:31:45.510] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L650> Expecting 832000 events.
[16:31:45.510] DEBUGHAL: <hal.cc/daqStart:L1072> Starting new DAQ session.
[16:31:45.511] DEBUGHAL: <hal.cc/daqStart:L1078> Allocated buffer size, Channel 0: 50000000
[16:31:45.512] DEBUGHAL: <hal.cc/daqStart:L1115> Enabling Deserializer160 for data acquisition. Phase: 4
[16:43:43.281] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L658> Loop finished, 2496008 words in buffer, loop took 4010ms.
[16:43:48.190] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L661> Readout size: 832000 Events, loop+readout took 8919ms.
Or, if you need another example: Doing a CalDel scan for all 4160 pixels in one go:
[16:32:31.691] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacScan:L785> Called SingleRocAllPixelsDacScan with flags 0, running 15 triggers.
[16:32:31.691] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacScan:L786> Scanning DAC 26 from 50 to 100
[16:32:31.691] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacScan:L789> Expecting 3182400 events.
[16:32:31.691] DEBUGHAL: <hal.cc/daqStart:L1072> Starting new DAQ session.
[16:32:31.691] DEBUGHAL: <hal.cc/daqStart:L1078> Allocated buffer size, Channel 0: 50000000
[16:32:31.691] DEBUGHAL: <hal.cc/daqStart:L1115> Enabling Deserializer160 for data acquisition. Phase: 4
[16:33:06.137] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacScan:L797> Loop finished, 4730528 words in buffer, loop took 34446ms.
[16:33:15.401] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacScan:L800> Readout size: 3182400 Events, loop+readout took 43710ms.
...and also all results look fine:
CalDel scan for ROC 0, pixel 11,20
15 | oooooooooooooooooo
| ..................
| o..................
| ...................
| ...................
10 | ...................
| ...................
| ...................
| ...................
| ...................
5 | o...................
| ....................
| ....................
| ....................
| ....................
0 |ooooooooooooooooooooooooooooooo....................
|___________________________________________________
' ' ' ' ' '
50 60 70 80 90 100
However, the GUI doesn't manage to run a simple PixelAlive map with 10 triggers, the RAM on the DTB is immediately filled with garbled data which contains invalid pixels, lots of zeros and, in the end, too few triggers.
And just for the sake of documentation: with testpxar
we can also do this in yet another dimension, doing a Full-ROC DacDac (CalDel vs ThrComp) Scan in less than 70sec:
[16:49:17.775] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacDacScan:L941> Called SingleRocAllPixelsDacDacScan with flags 0, running 5 triggers.
[16:49:17.775] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacDacScan:L943> Scanning DAC 26 from 130 to 145 vs. DAC 12 from 10 to 30
[16:49:17.775] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacDacScan:L949> Expecting 6988800 events.
[16:49:17.775] DEBUGHAL: <hal.cc/daqStart:L1075> Starting new DAQ session.
[16:49:17.776] DEBUGHAL: <hal.cc/daqStart:L1081> Allocated buffer size, Channel 0: 50000000
[16:49:17.776] DEBUGHAL: <hal.cc/daqStart:L1118> Enabling Deserializer160 for data acquisition. Phase: 4
[16:50:00.758] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacDacScan:L957> Loop finished, 13544444 words in buffer, loop took 42982ms.
[16:50:26.423] DEBUGHAL: <hal.cc/SingleRocAllPixelsDacDacScan:L960> Readout size: 6988800 Events, loop+readout took 68647ms.
Hi Simon,
Looking at the raw data and the "decoded" version of it returned from the test running from the GUI reveals a lot of garbage readout, so probably this is merely a settings problem we need to catch.
did you run with identical settings?
Cheers, --U.
yes I did - and that's the puzzling thing. otherwise the source of this would be clear and easy to fix with some more checks on the settings' validity.
are you able to reproduce the problem as described here?
I am puzzled at the moment as well.
I continued a bit with the "setup" test (using the daq length as indicator), and initially ran into problems immediately. These problems all went away after setting the chips into a well-defined state (in particular masking all un-tested pixels). Unfortunately, that is not really true for the PixelAlive, where that is ensured but the r/o screws up for ntrig large enough.
Cheers, --U.
On Tue, Feb 18, 2014 at 9:52 AM, simonspa notifications@github.com wrote:
yes I did - and that's the puzzling thing. otherwise the source of this would be clear and easy to fix with some more checks on the settings' validity.
are you able to reproduce the problem as described here?
Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35363753 .
Just to add to this issue: I can reproduce this from testpxar by reducing the test API calls to:
[...]
_api->HVon(); // not touching anything above
_api->_dut->testAllPixels(true);
_api->_dut->maskAllPixels(false);
std::vector<pxar::pixel> results;
results = _api->getEfficiencyMap(0, 100);
std::cout << " Number of pixels returned : " << results.size() << std::endl;
asciimap(results, 100, 0);
_api->HVoff(); // not touching anything below
[...]
This currently fails more often than not with my single-chip module (psi46dig) with errors similar to:
[.....]
[16:36:44.496] DEBUGRPC: "Daq_Read" called.
[16:36:44.540] DEBUGRPC: "Daq_Read" called.
[16:36:44.623] DEBUGRPC: "Daq_Read" called.
[16:36:44.629] DEBUGRPC: "Daq_Read" called.
[16:36:44.630] DEBUGHAL: <hal.cc/daqAllEvents:L1199> Finished readout.
[16:36:44.630] DEBUGHAL: <hal.cc/SingleRocAllPixelsCalibrate:L663> Readout size: 415999 Events, loop+readout took 4529ms.
[16:36:44.630] CRITICAL: <hal.cc/SingleRocAllPixelsCalibrate:L666> Incomplete DAQ data readout! Missing 1 Events.
[16:36:44.630] DEBUGHAL: <hal.cc/daqStop:L1308> Stopped DAQ session.
[16:36:44.630] DEBUGRPC: "Pg_Stop" called.
[16:36:44.630] DEBUGRPC: "GetRpcCallId" called.
[16:36:44.631] DEBUGRPC: "Daq_Stop" called.
[16:36:44.631] DEBUGRPC: "GetRpcCallId" called.
[16:36:44.631] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "Daq_Stop" called.
[16:36:44.632] DEBUGRPC: "uDelay" called.
[16:36:44.632] DEBUGHAL: <hal.cc/daqClear:L1332> Closing DAQ session, deleting data buffers.
[16:36:44.632] DEBUGRPC: "Daq_Close" called.
[16:36:44.632] DEBUGRPC: "GetRpcCallId" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.633] DEBUGRPC: "Daq_Close" called.
[16:36:44.640] DEBUGAPI: <api.cc/MaskAndTrim:L1646> ROC 0 features 0 masked pixels.
[16:36:44.640] DEBUGAPI: <api.cc/MaskAndTrim:L1655> Masking ROC 0 in one go.
[16:36:44.640] DEBUGRPC: "roc_I2cAddr" called.
[16:36:44.640] DEBUGHAL: <hal.cc/RocSetMask:L506> Masking ROC 0
[16:36:44.640] DEBUGRPC: "roc_Chip_Mask" called.
[16:36:44.640] DEBUGAPI: <api.cc/expandLoop:L1326> Test took 4566ms.
[16:36:44.642] DEBUGAPI: <api.cc/repackMapData:L1384> Simple Map Repack of 415999 data blocks, returning efficiency.
[16:36:44.643] CRITICAL: <api.cc/condenseTriggers:L1337> Data size does not correspond to 100 triggers! Aborting data processing!
[16:36:44.643] DEBUGAPI: <api.cc/repackMapData:L1403> Correctly repacked Map data for delivery.
[16:36:44.643] DEBUGAPI: <api.cc/repackMapData:L1404> Repacking took 1ms.
Number of pixels returned : 0
Another observation: with commit 79c78cbb59e0f987248f1f2ee9fde77a56f9998a I added debug printout if the pixel decoding fails; this happens with certain patterns:
[16:48:14.679] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe29: ROC 0 [0,80,25]
[16:48:14.679] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe40: ROC 0 [0,80,32]
[16:48:14.679] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe28: ROC 0 [0,80,24]
[16:48:14.679] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe2d: ROC 0 [0,80,29]
[16:48:14.680] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 4bfe47: ROC 0 [28,80,39]
[16:48:14.680] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe40: ROC 0 [0,80,32]
[16:48:14.680] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 83fe2f: ROC 0 [48,80,31]
[16:48:14.680] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe29: ROC 0 [0,80,25]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe41: ROC 0 [0,80,33]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe40: ROC 0 [0,80,32]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fe2b: ROC 0 [0,80,27]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.681] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.682] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.682] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.682] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
[16:48:14.682] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of ffffff: ROC 0 [98,80,255]
i.e. the row is typically "80".
Is anybody else seeing something similar?
Some more observations:
Yet more observations:
[.....]
[17:11:01.248] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46940: ROC 0 [63,206,160]
[17:11:01.248] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 54691b: ROC 0 [35,206,139]
[17:11:01.248] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46924: ROC 0 [63,206,148]
[17:11:01.248] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 44691b: ROC 0 [27,206,139]
[17:11:01.248] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a4691c: ROC 0 [63,206,140]
[17:11:01.249] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46829: ROC 0 [63,206,25]
[17:11:01.249] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 44691f: ROC 0 [27,206,143]
[17:11:01.249] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 64691f: ROC 0 [39,206,143]
[17:11:01.249] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a4691f: ROC 0 [63,206,143]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46928: ROC 0 [63,206,152]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46941: ROC 0 [63,206,161]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46cc4: ROC 0 [63,207,100]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a474cc: ROC 0 [63,208,108]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 506917: ROC 0 [33,206,135]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 9235ae: ROC 0 [57,12,222]
[17:11:01.329] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46902: ROC 0 [63,206,130]
[17:11:01.331] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 54691b: ROC 0 [35,206,139]
[17:11:01.331] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46824: ROC 0 [63,206,20]
[17:11:01.331] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 246944: ROC 0 [15,206,164]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46989: ROC 0 [63,206,201]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 54691b: ROC 0 [35,206,139]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 24691f: ROC 0 [15,206,143]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46940: ROC 0 [63,206,160]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a46925: ROC 0 [63,206,149]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a464c6: ROC 0 [63,205,102]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of a4691e: ROC 0 [63,206,142]
[17:11:01.332] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 246941: ROC 0 [15,206,161]
[17:11:01.333] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 9235ac: ROC 0 [57,12,220]
[...many more....]
ffffff
, and a row address of 80 most of the time, getEfficiencyVsDACDAC() has a more varying pattern with typical values similar to a4691f
Are the following two observations from a DacDacScan aspects of the same problem?
1) there is often a "noise band" (also visible in Philipp's presentation, but I don't know what fw/nios they used)
2) often there is single-DAC band at the low edge as well
On Tue, Feb 18, 2014 at 5:02 PM, Hanno Perrey notifications@github.comwrote:
Some more observations:
- the invalid pixels are usually (always?) found in the first Daq_Read() calls
- even when the readout size is correct, there might be invalid pixels in there; in the resulting ASCII map, only the first few lines are affected
Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35399086 .
Hi Urs et al.
1) I found out today that the bands disappear when the other pixels are all masked.
2) In pyXar we cannot see this. First line is 'in sync' with all others
cheers, Philipp On 18.Feb 2014, at 5:39 PM, ursl wrote:
Are the following two observations from a DacDacScan aspects of the same problem?
1) there is often a "noise band" (also visible in Philipp's presentation, but I don't know what fw/nios they used)
2) often there is single-DAC band at the low edge as well
On Tue, Feb 18, 2014 at 5:02 PM, Hanno Perrey notifications@github.comwrote:
Some more observations:
- the invalid pixels are usually (always?) found in the first Daq_Read() calls
- even when the readout size is correct, there might be invalid pixels in there; in the resulting ASCII map, only the first few lines are affected
Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35399086 .
— Reply to this email directly or view it on GitHub.
Hi everyone,
My latest observations using the FPIX module are slightly different now. First, the data pipes have started working on OS X. I don’t know why. But I now get invalid pixels during the parsing.
It does appear to be wrong in the data from the Daq_Read:
12:17:52.507] DEBUGPIPES: 0080 0091 00a0 00b1 001b 002f 003c 0043 005f 006c 0013 002f 003c 0043 005f 006c 00cf 00df 00ef 00ff [12:17:52.507] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of bfc3fc: ROC 0 [75,198,252] [12:17:52.507] CRITICAL: <datatypes.cc/decodeRaw:L25> invalid pixel from raw value of 3fc3fc: ROC 0 [27,198,252] [12:17:52.507] DEBUGPIPES: ====== 101 ====== ROC 7 [75,198,252] ROC 7 [27,198,252] [
There are several things wrong with this event.
When I use psi46test to read out the module the data looks ok, with all appropriate headers:
TBM H1(6) H2(4) H3(0) H4(1) EV(100) STF(0) PKR(0) STKCNT( 0)
Robert
On Feb 18, 2014, at 10:43 AM, Philipp Eller notifications@github.com wrote:
Hi Urs et al.
On 18.Feb 2014, at 5:39 PM, ursl wrote:
Are the following two observations from a DacDacScan aspects of the same problem?
1) there is often a "noise band" (also visible in Philipp's presentation, but I don't know what fw/nios they used) I found out today that the bands disappear when the other pixels are all masked.
2) often there is single-DAC band at the low edge as well In pyXar we cannot see this. First line is 'in sync' with all others
cheers, Philipp
On Tue, Feb 18, 2014 at 5:02 PM, Hanno Perrey notifications@github.comwrote:
Some more observations:
- the invalid pixels are usually (always?) found in the first Daq_Read() calls
- even when the readout size is correct, there might be invalid pixels in there; in the resulting ASCII map, only the first few lines are affected
Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35399086 .
— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHub.
Okay, I reworked parts of the NIOS code to follow Urs' suggestion of really masking the whole chip and just trimming the pixel in question (other than the approach we took until now: trim all necessary pixels and only send Calibrate to the ones to be measured).
At first sight this looked promising and working, but now that I reworked part of the NIOS code to allow storage of trim bits in the DTB RAM I encounter the same problems as before, but with all different software options.
It is slightly different though, it looks like only the first 10-20 events are garbled and merged, and the rest of the readout looks more or less fine (as far as I can see by staring at the raw readout).
Conclusion: this does not seem to be the Holy Grail yet
I was sobered today as well. My observation of yesterday with a 'better' setup ROC did not work for threshold measurements. They still worked well for the efficiencyMap determination.
Cheers, --U.
Am Freitag, 21. Februar 2014 schrieb simonspa :
Okay, I reworked parts of the NIOS code to follow Urs' suggestion of really masking the whole chip and just trimming the pixel in question (other than the approach we took until now: trim all necessary pixels and only send Calibrate to the ones to be measured).
At first sight this looked promising and working, but now that I reworked part of the NIOS code to allow storage of trim bits in the DTB RAM I encounter the same problems as before, but with all different software options.
It is slightly different though, it looks like only the first 10-20 events are garbled and merged, and the rest of the readout looks more or less fine (as far as I can see by staring at the raw readout).
Conclusion: this does not seem to be the Holy Grail yet
Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35749083 .
Some more updates and investigations on this.
Some remarks first:
And now the results:
EPCS: ok
#regions: 1
region 0
block size: 65536
number of blocks: 32
offset: 0
size: 2097152
Welcome!
daq_mem_base address: 4b800
Daq_Read first bytes: c 0
Daq_Read first bytes: 20b 46c7
Daq_Read first bytes: 4aab 87f8
Daq_Read first bytes: 87f8 681
daq_mem_base address: 4b800
Daq_Read first bytes: 87f8 22
Daq_Read first bytes: 20b 46c5
Daq_Read first bytes: 4aab 87f8
Daq_Read first bytes: 87f8 681
daq_mem_base
is the pointer address of the DAQ RAM block. It's printed in the Daq_Open()
function, which means I'm running the pixel alive test twice here. The other printout is issued every time I call the Daq_Read()
function after the trigger loop finished. As one can see, the first readout of the first test has clearly garbled data in it, just as I see it on the PC side, showing the c 0
as the first two 16-bit words:[16:34:26.366] DEBUGPIPES: 000c 0000 000c 0000 ffff ffff 6144 5f71 704f 6e65 4924 4349 2400 0076 07e9 0000 4376 0000 07e1 0000 0024 0000 0024 0000 ffff ffff 6f42 7261 2064 6469 203a 2020 44ad 87f8 0022 44ab 87f8 00
[51,79]
appearing, with its 10 triggers (events). This means it's not the end of the data stream we are chopping of by closing the DAQ channel too fast.Daq_Stop()
and then read out the data or the other way around (as to be expected after the cross-check above)daq_mem_base
. This changes after a code change (in this case: deleted the content of some unused test functions) and re-compilation. The starting position is earlier in the RAM (lower register), so probably the NIOS code eats up less memory and more is available for allocation:Before: 0x4b800
After: 0x4ad90
uDelay(5000)
microsecs after the roc_Pix_Cal command, or some hundert us after Pg_Single()
do not change the data, only (as expected) the run time of the trigger loop. (Fun fact: regular single ROC pixel alive w/ 10 triggers takes 550ms in total)Not sure how to proceed.
I see a strong dependence of the error rate for PixelAlive (i.e. calling api::getEfficiencyMap(...) on the DAC parameters VthrComp and CalDel. Specifically: Moving from the current pretest "recommendation" into a more stable regime (shift CalDel to higher values and reduce VthrComp) improves the situation for PixelAlive very much. The DAC parameter settings do not help (in my case) for threshold scans. (I have no access to the disable-all-pixels-except-the-one-tested feature).
Is this a feature of my ROC/setup or do you see the same behavior?
On Mon, Feb 24, 2014 at 4:51 PM, simonspa notifications@github.com wrote:
Some more updates and investigations on this.
Some remarks first:
- All the below information refers to the DESER160 using a single ROC assembly with a PSI46digV2 chip and no sensor.
- The chip is fully working and efficient, no dead pixels or anything. This has been tested for weeks now.
- The test patterns I use is a simple Pixel Alive Test using 10 triggers per pixel. This allows me to check which triggers are missing by just looking at the decoded pixels.
And now the results:
- With the "mask all but one pixel throughout the loop" the problematic patterns somehow changed but did not vanish. What I see are both missing events in the beginning (always) and sometimes somewhere in the middle of a block, too.
- The patterns in the beginning of the readout (broken events) are always the same.
- When deleting (!) code from the NIOS softcore and recompiling a new firmware the patterns in the beginning are sometimes shorter, sometimes longer, apparently depending on the compiled NIOS code somehow.
- I think I ruled out the possibility of it being a problem anywhere on the PC side by adding debug statements to the NIOS code and looking at the data read from the RAM directly on the DTB. This reveals:
EPCS: ok
regions: 1
region 0 block size: 65536 number of blocks: 32 offset: 0 size: 2097152 Welcome! daq_mem_base address: 4b800 Daq_Read first bytes: c 0 Daq_Read first bytes: 20b 46c7 Daq_Read first bytes: 4aab 87f8 Daq_Read first bytes: 87f8 681 daq_mem_base address: 4b800 Daq_Read first bytes: 87f8 22 Daq_Read first bytes: 20b 46c5 Daq_Read first bytes: 4aab 87f8 Daq_Read first bytes: 87f8 681
- where daq_mem_base is the pointer address of the DAQ RAM block. It's printed in the Daq_Open() function, which means I'm running the pixel alive test twice here. The other printout is issued every time I call the Daq_Read() function after the trigger loop finished. As one can see, the first readout of the first test has clearly garbled data in it, just as I see it on the PC side, showing the c 0 as the first two 16-bit words:
[16:34:26.366] DEBUGPIPES: 000c 0000 000c 0000 ffff ffff 6144 5f71 704f 6e65 4924 4349 2400 0076 07e9 0000 4376 0000 07e1 0000 0024 0000 0024 0000 ffff ffff 6f42 7261 2064 6469 203a 2020 44ad 87f8 0022 44ab 87f8 00
- This means, USB data transfer and all is just fine and we get what was there.
- The last events read from the RAM always look good, and I always see pixel [51,79] appearing, with its 10 triggers (events). This means it's not the end of the data stream we are chopping of by closing the DAQ channel too fast.
- It makes no difference whether we first call Daq_Stop() and then read out the data or the other way around (as to be expected after the cross-check above)
- What is interesting here to note is the fact, that the corrupt readout is less of a problem the second time we run the test - but still is there. Running it a third time or 20 times doesn't change anything here.
- The allocated DAQ memory always starts at the same position daq_mem_base. This changes after a code change (in this case: deleted the content of some unused test functions) and re-compilation. The starting position is earlier in the RAM (lower register), so probably the NIOS code eats up less memory and more is available for allocation:
Before: 0x4b800 After: 0x4ad90
- In some cases, mostly when we have little problematic readout in the first run anyways, the further runs have no problem and run through without issues.
- Timing within the actual trigger loop does not seem to affect this behavior at all. Even waiting uDelay(5000) microsecs after the roc_Pix_Cal command, or some hundert us after Pg_Single() do not change the data, only (as expected) the run time of the trigger loop. (Fun fact: regular single ROC pixel alive w/ 10 triggers takes 550ms in total)
Not sure how to proceed.
Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-35899805 .
Hi Urs,
all changes are now in the two branches https://github.com/psi46/pxar/tree/nios-trim-storage (for the pxar part) and https://github.com/psi46/pixel-dtb-firmware/tree/trim-on-nios (for the firmware part).
Cheers Simon
Update on this issue after somedays/weeks of testing with the new Firmware 2.x:
Updates from, using the daqtest
datagenerator:
PSI46V2 ROC/Wafer Tester V1.5 (28.03.2014)
DTB DTB_WS8ZAW opened
--- DTB info-------------------------------------
Board id: 85
HW version: DTB1.2
FW version: 2.1
SW version: 2.10
USB id: DTB_WS8ZAW
MAC address: 40D855118055
Hostname: pixelDTB085
Comment:
-------------------------------------------------
+-- control commands ------------------------------------------+
| h display this text |
| exit exit commander |
+-- chip test -------------------------------------------------+
| test <chip id> run chip test |
+--------------------------------------------------------------+
> daqtest
0: created 100000 -> #samples=100000; #errors=0
1: created 100000 -> #samples=100000; #errors=0
2: created 100000 -> #samples=100000; #errors=0
3: created 100000 -> #samples=100000; #errors=0
4: created 100000 -> #samples=100000; #errors=4 <----
5: created 100000 -> #samples=100000; #errors=0
6: created 100000 -> #samples=100000; #errors=0
[...]
19: created 100000 -> #samples=100000; #errors=0
#block errors 1
> daqtest
0: created 100000 -> #samples=100000; #errors=0
[...]
531: created 100000 -> #samples=100000; #errors=0
#block errors 0
Daq_Stop()
) I don't see the startup behaviour:PSI46V2 ROC/Wafer Tester V1.5 (28.03.2014)
DTB DTB_WS8ZAW opened
--- DTB info-------------------------------------
Board id: 85
HW version: DTB1.2
FW version: 2.1
SW version: 2.10
USB id: DTB_WS8ZAW
MAC address: 40D855118055
Hostname: pixelDTB085
Comment:
-------------------------------------------------
+-- control commands ------------------------------------------+
| h display this text |
| exit exit commander |
+-- chip test -------------------------------------------------+
| test <chip id> run chip test |
+--------------------------------------------------------------+
> daqtest
0: created 100000 -> #samples=100000; #errors=0
[...]
1083: created 100000 -> #samples=100000; #errors=0
#block errors 0
However, @werdmann reports more and different problems using this mechanism.
Just a reminder that with the firmware v2.14 and latest pxar, I still get the error with Pretest:
[16:39:17.595] INFO: PixTestPretest::setVana() done, Module Ia 382.7 mA = 23.9187 mA/ROC [16:41:58.281] CRITICAL: <USBInterface.libftd2xx.cc/FillBuffer:L258> Requested to read 3599b, but read 3597b - 2b missing! terminate called after throwing an instance of 'pxar::UsbConnectionTimeout' what(): Read from USB timed out. Aborted (core dumped)
Hi Xin,
I am able to run the Pretest without problems. Can you check if the USB port that the DTB is connected is USB2.0? Sometimes not all the ports on the machine are 2.0. On my linux system I only have 1 USB2.0 port and the others are 1.0.
Robert
On May 7, 2014, at 3:48 PM, Xin Shi notifications@github.com wrote:
Just a reminder that with the firmware v2.14 and latest pxar, I still get the error with Pretest:
[16:39:17.595] INFO: PixTestPretest::setVana() done, Module Ia 382.7 mA = 23.9187 mA/ROC [16:41:58.281] CRITICAL: Requested to read 3599b, but read 3597b - 2b missing! terminate called after throwing an instance of 'pxar::UsbConnectionTimeout' what(): Read from USB timed out. Aborted (core dumped)
— Reply to this email directly or view it on GitHub.
Hi Robert,
Thanks for your suggestion. I just checked :
Device: ID 0403:6001 Future Technology Devices International, Ltd FT232 USB-Serial (UART) IC
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
It looks I connected it to 1.10 version. I will try the 2.0 version tomorrow.
Xin
On Wed, May 7, 2014 at 4:52 PM, Robert Stringer notifications@github.comwrote:
Hi Xin,
I am able to run the Pretest without problems. Can you check if the USB port that the DTB is connected is USB2.0? Sometimes not all the ports on the machine are 2.0. On my linux system I only have 1 USB2.0 port and the others are 1.0.
Robert
On May 7, 2014, at 3:48 PM, Xin Shi notifications@github.com wrote:
Just a reminder that with the firmware v2.14 and latest pxar, I still get the error with Pretest:
[16:39:17.595] INFO: PixTestPretest::setVana() done, Module Ia 382.7 mA = 23.9187 mA/ROC [16:41:58.281] CRITICAL: Requested to read 3599b, but read 3597b - 2b missing! terminate called after throwing an instance of 'pxar::UsbConnectionTimeout' what(): Read from USB timed out. Aborted (core dumped)
— Reply to this email directly or view it on GitHub.
— Reply to this email directly or view it on GitHubhttps://github.com/psi46/pxar/issues/64#issuecomment-42481627 .
I guess this issue is more or less obsolete since it more and more looks like this (hopefully last) issue we are currently fighting with concerning data transmission is related to USB.
Please discuss further in #121
A final comment: the USB I plugged in was indeed 2.0. After I reboot, it seems working now. I will post further progress in other thread. Close this one now :)
When running a Pixel Alive test using the API function
getEfficiencyMap
from the pxar GUI currently we get a critical warning about missing events not present in the DTB buffer:However, when running exactly the same test from testpxar (even 10 times one after another) I don't manage to get that error even once:
This needs to be understood and solved.