Closed dmikushin closed 10 years ago
Have You checked Your Mainboard for 64-Bit large BAR PCI support? Ahmet
How to check this? Does that mean we need an IOMMU-supporting mainboard from this list: http://en.wikipedia.org/wiki/List_of_IOMMU-supporting_hardware ?
2014-03-16 9:40 GMT+01:00 Ahmet Inan notifications@github.com:
Have You checked Your Mainboard for 64-Bit large BAR PCI support? Ahmet
Reply to this email directly or view it on GitHubhttps://github.com/xdsopl/mpss-modules/issues/2#issuecomment-37752103 .
There is a nice post by Dr. Donald Kinghorn titled "Will your motherboard work with Intel Xeon Phi?", that you should read. Ahmet
Thanks, I wonder more specifically why even before the actual allocation fails, the pci_resource_start / pci_resource_len calls already return zeros for bar 0:
bd_info->bi_ctx.aper.pa = pci_resource_start(pdev, DLDR_APT_BAR);
printk("bar 0 start = %p\n", bd_info->bi_ctx.aper.pa); bd_info->bi_ctx.aper.len = pci_resource_len(pdev, DLDR_APT_BAR); printk("bar 0 len requested = %d\n", bd_info->bi_ctx.aper.len);
^^Prints (null) and 0. Why?
2014-03-17 17:00 GMT+01:00 Ahmet Inan notifications@github.com:
There is a nice post by Dr. Donald Kinghorn titled "Will your motherboard work with Intel Xeon Phi?", that you should read. Ahmet
Reply to this email directly or view it on GitHubhttps://github.com/xdsopl/mpss-modules/issues/2#issuecomment-37833247 .
As far as i can see it here, the card has only 2 memory regions and they're both 64-Bit. So pciresource{start,len} will give you nothing, as the 64-Bit BAR incapable Mainboard ignores those Entries in the Config Space.
This is what i get when loading the mic module patched with your printk's (with %llu): [530926.991625] vnet: mode: dma, buffers: 62 [530926.992088] bar 0 start = 4389456576512 [530926.992090] bar 0 len requested = 8589934592 [530926.992151] mic 0000:02:00.0: irq 135 for MSI/MSI-X [530926.992203] mic0: Transition from state ready to resetting [530937.001190] mic_probe 2:0:0 as board #0 [530937.001241] mic: number of devices detected 1 [530938.002594] mic0: Resetting (Post Code 12) [530938.002620] mic0: Transition from state resetting to ready [530938.002651] My Phys addrs: 0x80fffe0000 and scif_addr 0x80fffdf680
What does "lspci -vv" give you for the Phi Co-Processor?
This is what i get:
02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20)
Subsystem: Intel Corporation Device 3608
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
Ahmet
[ 221.902195] vnet: mode: dma, buffers: 62 [ 221.902378] mic 0000:01:00.0: enabling device (0000 -> 0002) [ 221.902389] mic 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [ 221.902395] mic 0000:01:00.0: setting latency timer to 64 [ 221.902401] mic 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [ 221.902404] bar 4 start = 00000000fbb00000 [ 221.902405] bar 4 len requested = 131072 [ 221.902410] bar 0 start = (null) [ 221.902412] bar 0 len requested = 0 [ 221.902414] mic 0: failed to reserve aperture space [ 221.902529] mic: No MIC boards present. SCIF available in loopback mode
2014-03-17 21:31 GMT+01:00 Ahmet Inan notifications@github.com:
As far as i can see it here, the card has only 2 memory regions and they're both 64-Bit. So pciresource{start,len} will give you nothing, as the 64-Bit BAR incapable Mainboard ignores those Entries in the Config Space.
This is what i get when loading the mic module patched with your printk's (with %llu): [530926.991625] vnet: mode: dma, buffers: 62 [530926.992088] bar 0 start = 4389456576512 [530926.992090] bar 0 len requested = 8589934592 [530926.992151] mic 0000:02:00.0: irq 135 for MSI/MSI-X [530926.992203] mic0: Transition from state ready to resetting [530937.001190] mic_probe 2:0:0 as board #0 [530937.001241] mic: number of devices detected 1 [530938.002594] mic0: Resetting (Post Code 12) [530938.002620] mic0: Transition from state resetting to ready [530938.002651] My Phys addrs: 0x80fffe0000 and scif_addr 0x80fffdf680
What does "lspci -vv" give you for the Phi Co-Processor? This is what i get: 02:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20) Subsystem: Intel Corporation Device 3608 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 40 Region 0: Memory at 3fe00000000 (64-bit, prefetchable) [size=8G] Region 4: Memory at de8e0000 (64-bit, non-prefetchable) [size=128K] Capabilities: [44] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [4c] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <4us, L1 unlimited ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [98] MSI-X: Enable+ Count=16 Masked- Vector table: BAR=4 offset=00017000 PBA: BAR=4 offset=00018000 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Kernel driver in use: mic Kernel modules: mic_host
Ahmet
Reply to this email directly or view it on GitHubhttps://github.com/xdsopl/mpss-modules/issues/2#issuecomment-37866210 .
and what is the output of "lspci -vv -s 0000:01:00.0" ?
Subsystem: Intel Corporation Device 3608
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
Our motherboard is Supermicro X8DTG-QF, I think it's relatively new...
2014-03-17 22:16 GMT+01:00 Dmitry Mikushin dmitry@kernelgen.org:
Subsystem: Intel Corporation Device 3608 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: Memory at (64-bit, prefetchable) Region 4: Memory at fbb00000 (64-bit, non-prefetchable) [size=128K] Capabilities: [44] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [4c] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr-
TransPend- LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <4us, L1 unlimited ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-,
Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [98] MSI-X: Enable- Count=16 Masked-
Vector table: BAR=4 offset=00017000 PBA: BAR=4 offset=00018000 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Kernel driver in use: mic
2014-03-17 22:09 GMT+01:00 Ahmet Inan notifications@github.com:
and what is the output of "lspci -vv -s 0000:01:00.0" ?
Reply to this email directly or view it on GitHubhttps://github.com/xdsopl/mpss-modules/issues/2#issuecomment-37870468 .
I am not familiar with that Board .. if it really has 64Bit BAR Support, then you could also try appending "pci=noapic" to your boot options, as described in "More on motherboards (even mATX!) for Xeon Phi" by Dr. Donald Kinghorn.
Ahmet
Searching the Intel forum for "SUPERMICRO X8DTG-QF" revealed that there are Others with the same Problem. Maybe there is an BIOS update from SUPERMICRO that might help. Ahmet
Yes, the BIOS update X8DTG-QF_t9233.ROM from Supermicro helped.
Thanks,
2014-03-18 10:32 GMT+01:00 Ahmet Inan notifications@github.com:
Searching the Intel forum for "SUPERMICRO X8DTG-QF" revealed that there are Others with the same Problem. Maybe there is an BIOS update from SUPERMICRO that might help. Ahmet
Reply to this email directly or view it on GitHubhttps://github.com/xdsopl/mpss-modules/issues/2#issuecomment-37913359 .
The BIOS that actually helped:
Hi,
Thank you for your great work,
$ uname -a Linux xeonphi-cmc 3.11.0-15-generic #25~precise1-Ubuntu SMP Thu Jan 30 17:39:31 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[220066.510779] vnet: mode: dma, buffers: 62 [220066.510830] mic 0: failed to reserve aperture space [220066.510855] mic: No MIC boards present. SCIF available in loopback mode
What might be the reason for this? Thanks,