nxp-mcuxpresso / mcux-sdk

MCUXpresso SDK
BSD 3-Clause "New" or "Revised" License
301 stars 136 forks source link

[BUG][ENET_QOS]When setting 100MZh half-duplex, transmission does not complete. #156

Open fumitaka-takahashi opened 6 months ago

fumitaka-takahashi commented 6 months ago

Hello

When the ENET_QOS driver was set to 100MZh half-duplex, a problem occurred where transmission was not completed.

As a result of the investigation, a DMA transmission completion interrupt occurs for DMA channel 0, but no transmission completion interrupt occurs for other DMA channels. Therefore, the kENET_QOS_TxIntEvent callback event does not occur, making it impossible to send the next data.

Please check the problem.

thank you.

mcuxsusan commented 6 months ago

Could you please help provide more details which could help developer to reproduce the issue? Such as:

fumitaka-takahashi commented 6 months ago

Hello

Is the information below correct? ●TAG/SHA or SDK release revision

define FSL_ENET_QOS_DRIVER_VERSION (MAKE_VERSION(2, 5, 3))

●Board/SoC MIMX8ML4 ●Toolchain llvm-project-llvmorg-15.0.3 ●Code snippet ???

sorry. I don't know the code snippet.

Thank you.

mcuxsusan commented 6 months ago

Hi @fumitaka-takahashi, for code snippet I mean pieces of code, would you like to share your code(or part of the code which could result the issue) with us to help developer to reproduce the issue?

fumitaka-takahashi commented 6 months ago

Hi

Since the code is running in our customer's environment, we need to ask the customer if they can share the code. please wait a moment. However, the differences with the latest version of MCUXpresso code are very small. Issue #114, which I previously asked about, has been fixed in our code.

IMX8MPEVK ENET_QOS imx-dwmac Half Duplex Crashes I found a history discussing the same issue here Is it related to this problem?

Thank you.

fumitaka-takahashi commented 5 months ago

Hi I have tried to provide you with reproducible code, but due to the special nature of our environment (hardware and RTOS, etc.) I find it difficult to do so. Does this problem reproduce in your environment? If this problem does not reproduce in your environment, this problem may be a problem depending on our environment. Please give me the information. Thank you.

fumitaka-takahashi commented 5 months ago

Hi @mcuxsusan, According to the information from IMX8MPEVK ENET_QOS imx-dwmac Half Duplex Crashes, it is stated that in i.MX8MP, DMA Multi-Queuing does not work when ENET_QOS Ethernet is a half-duplex link. I think this information makes me wonder if MCUXpresso's ENET_QOS driver has the same issue. Is this understanding correct? I would like information regarding this issue. Thank you.

mcuxsusan commented 5 months ago

Hi, will check with the development team about the issue you have provided. Feedback maybe delayed, appreciate for your patience.

- Reply to this email directly, view it on GitHubhttps://github.com/nxp-mcuxpresso/mcux-sdk/issues/156#issuecomment-1888887445, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATJAZBBYZ5YAWDI65QX6OWTYOEJ4LAVCNFSM6AAAAABBDGTBLKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYHA4DONBUGU. You are receiving this because you were mentioned.Message ID: @.**@.>>

fumitaka-takahashi commented 5 months ago

Hi,@mcuxsusan Thank you for requesting the investigation. However, in order for our device to pass the certification test of a certain certification body, we must solve this problem. We don't have much time. First of all, could you please give me some information as to whether this issue is related to the Linux driver issue?(ASAP)

Thank you.

SuperHeroAbner commented 5 months ago

Hi @fumitaka-takahashi , I just take a look at this today. I'm not sure I fully understand this issue. I did a simple test based the enet_qos_txrx_multiring_interrupt case on RT1170 with latest ENET_QOS driver 2.6.1 on current Github repo https://github.com/nxp-mcuxpresso/mcux-sdk/blob/main/drivers/enet_qos/fsl_enet_qos.h#L35. Used the other board to send frame with 100M half-duplex mode. It looks OK. Did you or your customer try this new enet_qos driver version? And I will try i.MX8MP(8ML device) board and 2.5.3 enet_qos driver next to see whether the issue is from device/IP version or previous driver. If the test way is not right, please tell me how to reproduce step by step such as prepare what kind of frame and how to set Rx parser.

fumitaka-takahashi commented 5 months ago

Hi @CodemonKeyfsz.

I appreciate your cooperation.

I will talk from our situation so far. When we tested on our board, we found that the operation on the Ethernet link of Full-Duplex was fine, but in the case of Half-Duplex Ethernet links, we confirmed the phenomenon that the transmission stopped halfway. As a result of my investigation, it was found that in the case of the Half-duplex link, there was an interrupt of the 0ch of the DMA, but the transmission completion has not occurred for the DMA 1ch channel. 。

So I wrote a bug report here.

After that, when we examined the web information, it was found that the Linux driver had reported the same problem and had been solved. https://community.nxp.com/t5/i-mx-processors/imx8MPevk-enet-qos-imx-imx-imx-half-dlex-crashes/m-p/1455434?profile.language=en After confirming this patch information, it was found that this patch stopped using the DMA Multi-Queuing and Multi-Channel features. Can IMX8MP use Multi-Queuing and Multi-Channel in Half-Duplex link? I would like you to check this information in your company.

Thank you.

SuperHeroAbner commented 5 months ago

Hi @fumitaka-takahashi , can you describe more details about your test? I saw your comment can't get kENET_QOS_RxIntEvent, so I treat it as Rx problem, but according to your comment above, it seems like Tx doesn't succeed. Is there no kENET_QOS_TxIntEvent? Or you can get Tx event but no correspond Rx event. Latest update: I reproduce this problem, it can't generate the kENET_QOS_TxIntEvent in 100M half-duplex.

fumitaka-takahashi commented 5 months ago

Hi @fumitaka-takahashi , can you describe more details about your test? I saw your comment can't get kENET_QOS_RxIntEvent, so I treat it as Rx problem, but according to your comment above, it seems like Tx doesn't succeed. Is there no kENET_QOS_TxIntEvent? Or you can get Tx event but no correspond Rx event. Latest update: I reproduce this problem, it can't generate the kENET_QOS_TxIntEvent in 100M half-duplex.

Hi @CodemonKeyfsz.

sorry. I wrote it by mistake. The event does not occur is kENET_QOS_TxIntEvent. I have corrected the errors I wrote on this thread.

Thank you.

SuperHeroAbner commented 5 months ago

Hi @fumitaka-takahashi , I get a primary feedback from that ticket linker you find. Yes, this feature doesn't support in hardware. I need find the design to make it more clear and add error code in driver to remind user if it indeed can't work.

fumitaka-takahashi commented 5 months ago

Hi @CodeMonkeyFSZ.

Thank you for your reply.

I realized that this problem is a hardware issue, not a software (driver) issue. Is this problem unique to i.MX8MP? With RT1170, do Multi-Queuing and Multi-Channel work even when using a Half-Duplex link?

In our program (on i.MX8MP), we will modify and confirm that communication uses Multi-Queuing and Multi-Channel during Full-Duplex, and does not use these during Half-Duplex.

Thank you.

SuperHeroAbner commented 5 months ago

Is this problem unique to i.MX8MP?

fumitaka-takahashi commented 5 months ago

Hi @CodeMonkeyFSZ.

Thank you for the additional information. Thanks to you, I now understand the situation.

I'm currently checking full duplex and half duplex operation on our board. There is one problem that needs to be resolved. About how our system works: ・For full duplex, use Multi-Queuing and Multi-Channel. ・Multi-Queuing and Multi-Channel are not used when half-duplex is used. ・ Our system is configured to support hot insertion and removal of Ethernet cables. At this time, the movement differs depending on the method of first linking.

Steps when there are no problems:

  1. Power on our board
  2. Connect to a Hub configured for Auto-negotiation.
  3. Our boards link in full duplex.
  4. Check communication using the ping command from a Windows PC.
  5. There was a response to the ping command.(OK)
  6. Connect the Ethernet cable to the Hub with a fixed 100Mhz full duplex setting.
  7. Our board links at 100MHz half duplex. (This is not a problem as it is a limitation of the Ethernet specifications)
  8. Check communication using the ping command from a Windows PC.
  9. There was a response to the ping command.(OK)
  10. Connect to a Hub configured for Auto-negotiation.
  11. Our boards link in full duplex.
  12. Check communication using the ping command from a Windows PC.
  13. There was a response to the ping command.(OK)
  14. From now on, 6. From 13. Repeat the steps up to this point to confirm that the ping command response is normal.

Steps to take when there is a problem:

  1. Power on our board
  2. Connect the Ethernet cable to the Hub with a fixed 100Mhz full duplex setting.
  3. Our board links at 100MHz half duplex. (This is not a problem as it is a limitation of the Ethernet specifications)
  4. Check communication using the ping command from a Windows PC.
  5. There was a response to the ping command.(OK)
  6. Connect to a Hub configured for Auto-negotiation.
  7. Our boards link in full duplex.
  8. Check communication using the ping command from a Windows PC.
  9. No response to ping command.(NG)
  10. Connect the Ethernet cable to the Hub with a fixed 100Mhz full duplex setting.
  11. Our board links at 100MHz half duplex. (This is not a problem as it is a limitation of the Ethernet specifications)
  12. Check communication using the ping command from a Windows PC.
  13. No response to ping command.(NG)

Please check this sequence.

Thank you.

SuperHeroAbner commented 5 months ago

Hi @fumitaka-takahashi I don't get the key point of that. Why there's NG in 'Steps to take when there is a problem:'? Do you want to ask me are these Steps OK? Or do you want to ask me how to resolved these NGs. When link is down and up, you should re-init ENET QOS with single ring or multiple rings according to 100M half-duplex or other link status. Of course, I just assume it's issue, I still don't get the official feedback from IP design.

fumitaka-takahashi commented 5 months ago

Hi @CodeMonkeyFSZ

Thank you for your reply.

Your guess was correct.

In our problematic environment, we were running ENET_QOS_Down () when the PHY link was down, and ENET_QOS_Up () when the PHY link was up.

From your answer I found out that ENET_QOS_Deinit () is required. However, when executing ENET_QOS_Deinit () when the PHY link was down, PHY_GetAutoNegotiationStatus () hung. I investigated and found that ENET_QOS_Deinit () stops the clock, so I can't monitor the PHY link. So I modified it to run ENET_QOS_Down () when the PHY link goes down, ENET_QOS_Deinit () and ENET_QOS_Init () when the PHY link comes up, and the problem was resolved. This method works fine in our environment, so we will use it this way.

However, once the PHY link is up, I feel that an API specification that only executes ENET_QOS_Up () and ENET_QOS_Down () for subsequent PHY link up/down is better. For the ENET driver, there is no problem with just ENET_QOS_Up () and ENET_QOS_Down (). Wouldn't it be better to match the specifications with the ENET driver? Thank you.

SuperHeroAbner commented 4 months ago

Hi @fumitaka-takahashi , I got the feedback from IP: image Do you mean Down+Up is OK for full-duplex link change but for half-duplex need to do Down + Deinit/Up to restart the link?

fumitaka-takahashi commented 4 months ago

Hi @CodeMonkeyFSZ

Thanks for the feedback from IP

I understood this problem to be due to a problem with the IP design, where Multi-Queuing and Multi-Channel functionality is no longer available when using a half-duplex link. Is that understanding correct? Also, is this errata published?

Now regarding your question:

Do you mean Down+Up is OK for full-duplex link change but for half-duplex need to do Down + Deinit/Up to restart the link?

First, please check this link for the "Steps to take when there is a problem:" sequence. When I wrote this comment, our code was running the ENET_QOS API Up/Down to restart ENET_QOS when the PHY link went up/down. This is because Deinit stops the clock, making it impossible to monitor PHY reconnection. I looked into Deinit's code and found that it resets the DMA, so I thought it was necessary and changed the procedure. When the Ethernet cable is disconnected, only Down is executed as before, but when the Ethernet cable is reconnected, I changed the procedure to Deinit and then Init, and the problem no longer occurred.

Thank you.

SuperHeroAbner commented 4 months ago

Hi @fumitaka-takahashi ,

  1. I will push related colleague to update the doc. But secretly telling you not to have expectations for it can be done in some exact time. This IP is not NXP's, I believe the content in RM is copied from ceratin legacy version.
  2. So actually move below code to _Down() can make you only use Up/Down when Link change, is it right? base->DMA_MODE |= ENET_QOS_DMA_MODE_SWR_MASK; while ((base->DMA_MODE & ENET_QOS_DMA_MODE_SWR_MASK) != 0U) { }
fumitaka-takahashi commented 4 months ago

Hi @CodeMonkeyFSZ

Thank you for your reply.

1.I will push related colleagues to update the doc. But secretly telling you not to have expectations for it can be done in some exact time. This IP is not NXP's, I believe the content in RM is copied from ceratin legacy version.

We spent a considerable amount of time trying to resolve this issue. I hope this issue is made known.

2.So actually move below code to _Down() can make you only use Up/Down when Link change, is it right? base->DMA_MODE |= ENET_QOS_DMA_MODE_SWR_MASK; while ((base->DMA_MODE & ENET_QOS_DMA_MODE_SWR_MASK) != 0U) { }

Our code was fixed using the methods described above. No further modifications to the code are possible now. I would like to consider this next time.

Thank you.