sinara-hw / sinara

Sayma AMC/RTM issue tracker
Other
42 stars 7 forks source link

single-slot uTCA chassis #339

Closed gkasprow closed 7 years ago

gkasprow commented 7 years ago

The uTCA chassis which is developed for another project (related with video processing) is coming soon. The main difference is that requirements are slightly different. Main differences are:

In case of these issues I need some feedback before guys design PCBs and front panel.

Here is how it looks now (without front panel) and some not yet finished details : obraz

jordens commented 7 years ago

Looks good! Does the CPU do some JTAG translation (from the ethernet interface for example) or would the front panel JTAG access be the only one? To me the raw JTAG is too bulky and cumbersome. Either FTDI USB-JTAG or a similar solution would be nice.

gkasprow commented 7 years ago

CPU does only I2C init. If you volunteer to develop Ethernet to JTAG support then I can place ETH connector on it :) I can place Arduino style W5500 + MCU which works out of the box, the only missing part is support for Xilinx tool. FTDI-USB-JTAG is already on AMC board. There is some work done already so one can reuse it. W5500 has this advantage that is low cost, has hardwired TCP/IP and has working DMA over SPI. We use this approach in RF-PA to offload CPU. One can also do crazy thing and place ESP8266 2$ or even ESP32 wifi module. The HW is secondary thing, in our case software development is the most expensive part of it. So whatever you advice, I will implement

jordens commented 7 years ago

Hmm. There are already a bunch of approaches to do JTAG over ethernet.

But honestly, if you say that this is "without MCH", or "with MCH capabilities exposed on connectors", then just doing raw JTAG would be the way to go. And you are correct: most cards (Sayma especially) already include comprehensive JTAG-USB and JTAG routers. There is no need to wrap that again.

gkasprow commented 7 years ago

I can define it as engineering thesis to one of my students :)

gkasprow commented 7 years ago

There is implementation of XilinxVirtualCable on ESP8266, the only (little) problem in such cases is Wifi configuration :)

jordens commented 7 years ago

Yes. That's nice. We have done a lot of work on openOCD and having it paired with a well designed JTAG-over-Ethernet adapter would be great. If you do that, we would probably recommend some random ARM controller, Rust, and our open TCP/IP stack for that.

jordens commented 7 years ago

Yes. There are several

gkasprow commented 7 years ago

Would you prefer STM32 (used in RF-PA) or LPC7164 (used on AMC), Ethernet using internal MAC + external PHY (LAN8720) or external MAC+PHY+TCP/IP (W5500)? What is the easiest for you?

jordens commented 7 years ago

Maybe with TM4C1294 with everything integrated. But I am not the expert on that.

gkasprow commented 7 years ago

The processor looks OK, but I never used it before. Need to create libs, buy programmer. I'd stay with solution (STM32F407ZGT6 + W5500) used in RF-PA because we can simply reuse it. And it is also Cortex M4. We can also reuse solution from dual-AMC box which is LPC1764FBD100 + low cost PHY LAN8720A. @sbourdeauducq what do you think about it?

sbourdeauducq commented 7 years ago

We ported Rust and smoltcp to the TM4C1294. The chip has craploads of silicon bugs and then some more that we discovered. But they can be worked around and then it works well - solid and nice embedded network device with everything integrated (only needs a crystal, Ethernet transformer, and a few resistors). It can be programmed with a few OpenOCD commands and a cheap USB adapter (just the standard ARM thing, maybe your current programmer is already supported by OpenOCD).

I would definitely recommend the TM4C1294 for the quality of the software libraries and tools, and for the integrated Ethernet PHY.

I do not recommend W5500, ESP8266, ESP32, or lwIP.

LPC1764 may be acceptable (though it doesn't have a lot of RAM) but some work needs to be done to port decent software to it.

gkasprow commented 7 years ago

@sbourdeauducq lwIP is buggy, we have lot of troubles with it, that's why I don't propose it. We can use LPC1769 with twice more memory. Anything wrong with w5500? We use it in RF-PA and I'd like to know if there are any hidden features :)

sbourdeauducq commented 7 years ago

It's not 1992 and this thing should not be used, just like the BASIC-Stamp is dead. I cannot comment on bugs, but it's just an inflexible black box that someone invented.

gkasprow commented 7 years ago

@sbourdeauducq you simply afraid of NSA backdoor in it ;)

gkasprow commented 7 years ago

so let's use LPC1769, we will implement basic software because we know this chip and its bugs. I don't know rust and nobody from my team knows and don't have time to learn it. Then you can decide to port your solution or simply use our non-ideal one written in common language.

hartytp commented 7 years ago

Looks nice!

sbourdeauducq commented 7 years ago

you simply afraid of NSA backdoor in it ;)

No, it's just poor design, and it has practical consequences, e.g. what if you want IPv6? Replace all those chips in the devices, if an upgrade is available at all?

sbourdeauducq commented 7 years ago

I don't know rust and nobody from my team knows and don't have time to learn it.

I would really appreciate some effort to get proper firmware into your devices. The MMC code in the Sinara devices is really bad and unprofessional.

simply use our non-ideal one written in common language.

Rust prevents classes of bugs in software, especially around the use of memory management and pointers. It also has many nice features that lets you write code faster and cleaner. Yes, you have to learn it but then you're not spending time chasing obscure memory corruption problems, working around missing features in the programming environment, or reinventing wheels. And among software people, it is not exactly a niche language, look at some organizations using it.

And what TCP/IP stack?

hartytp commented 7 years ago

I would really appreciate some effort to get proper firmware into your devices. The MMC code in the Sinara devices is really bad and unprofessional.

I do have some sympathy for what you're saying here. But, having said that, we still have major issues using Artiq with gigabit ethernet switches. So let's not be too rude about this...

jordens commented 7 years ago

@hartytp How is that smoltcp bug relevant here? I'd be disappointed if you are using unrelated issues to quench a discussion about this device and how to design it. With that rhetoric you could invalidate any criticism and it would chill any effort to improve the code.

gkasprow commented 7 years ago

@sbourdeauducq Sinara AMC code is not production code. It is quick sketch to check if HW is working. It was not meant to be released to public at all. Creotech almost finished Open MMC code that supports i.e. Exar firmware upgrade.

dhslichter commented 7 years ago

@jordens I took the comment from @hartytp to mean that it's more productive to have a discussion about this without flaming/put-downs -- we are all on the same team here, and nobody is perfect/faultless in getting the system together (this was what I took the reference to smoltcp to mean), so we should try to be considerate of others when making the case for what we think is the best way to proceed on any given issue. As @gkasprow points out, much of this is still chewing gum and duct tape, trying to get prototype systems working, and we need to accept that a substantial amount of returning/backfilling for quality and robustness still needs to occur.

Questions like learning Rust vs implementing in other languages are valid and important to discuss here, but as @gkasprow points out there is a substantial time cost to getting people up to speed, which may not be the best use of time right now as we are hustling to meet deadlines for Kasli and Urukul prototypes for example.

jordens commented 7 years ago

@dhslichter Discarding comments by labeling them as flame or put-downs is also counterproductive. Let's return to the technical level and the actual questions that were posed and discussed.

Greg explicitly asked for our advice and what's easiest for us if we were to do it. The statement that the Sayma MMC is not at all a valuable starting point and therefore not something that should influence the design of this uTCA box stands unchallenged. And he explicitly confirmed that if I understand the statements correctly.

OTOH, I really look forward to having a high-quality codebase for the Sayma MMC.

You are correct: if we are only looking at Kasli and Urukul, then none this should received any attention right now.

@gkasprow Just to clarify: we didn't release the MMC firmware.

hartytp commented 7 years ago

@hartytp How is that smoltcp bug relevant here? I'd be disappointed if you are using unrelated issues to quench a discussion about this device and how to design it. With that rhetoric you could invalidate any criticism and it would chill any effort to improve the code.

I'm all for constructive discussion about how to design this device. And, for what it's worth, if you feel that the way MMC has been implemented in Sayma is not ideal for you then I'd support it being implemented the way you want it in a future revision -- you are the ones who will be responsible for writing the code afterrall.

My meaning was twofold. Firstly, I was referring to the comments above trying to push all Sinara projects to use rust/smoltcp. While I agree that this could be a really nice route to go in the long-term, I'm not convinced it's the right way to go for now. Yes, rust eliminates certain kinds of bugs and, yes, chips like the w5500 have their issues. But, from a user's perspective, my feeling is this: I have a large amount of Ethernet-based devices in my lab. The bulk of them run firmware written in c and will use either lwip or something like the w5500. And, they all work reliably. The amount of pain that smoltcp/artiq networking has caused us (and to a certain extent still is causing us) over the past year is greater than all the other network-based equipment we have in the lab put together. Unlike the w5500, smoltcp is not yet widely deployed or tested and is missing some important features. While it's open-sourced, it's primarily maintained by a single developer with a slightly erratic schedule. I do think/hope it will get to the point of being a great, widely deployed robust network stack and, I am encouraged by the community that's starting to grow around it. But, it's not clear to me how long that will take, and I'm not ready to commit to it yet.

Also, while the MMC may be a bit of a beast, most of the firmware we need is pretty trivial. When written in decent C following good practice, that kind of simple stuff should just work. So, I really don't see the need to push Greg's team to learn rust now, particularly given how much time pressure we're under; this just isn't the right battle to fight now.

hartytp commented 7 years ago

The second point, and this may be a cultural/community thing, is that I don't think it's helpful to start labeling things as "unprofessional" particularly on public forums. No one's code or project management is perfect, and that kind of language isn't called for. I'm happy to support you on technical grounds, but that language looses my support quickly.

Greg is pretty thick skinned and probably doesn't mind this. But, someone put a lot of time into that code even if you don't think it's in a good state. Posts on public forums can cause a lot of lasting offence surprisingly easily and can lead to major project management issues.

hartytp commented 7 years ago

Anyway, apologies if my post caused offence, let's leave this there.

sbourdeauducq commented 7 years ago

The amount of pain that smoltcp/artiq networking has caused us (and to a certain extent still is causing us) over the past year is greater than all the other network-based equipment we have in the lab put together.

A number of ARTIQ networking bugs are attributable to lwIP, which is why we moved away from it. It also puzzles me that many devices seem to be using lwIP without major problems, though Greg's experience seems to be different and corroborates mine. If there is something obviously wrong that we were doing with lwIP, I'd be curious to know - the ARTIQ lwIP code is still there in the release-2 branch and on master in early versions of ARTIQ-3.

hartytp commented 7 years ago

A number of ARTIQ networking bugs are attributable to lwIP, which is why we moved away from it. It also puzzles me that many devices seem to be using lwIP without major problems, though Greg's experience seems to be different and corroborates mine. If there is something obviously wrong that we were doing with lwIP, I'd be curious to know - the ARTIQ lwIP code is still there in the release-2 branch.

I guess the question is: if we'd spend the man year (?) that getting smoltcp to its current state has taken on tracking down and fixing the lwIP issues, where would we be? Writing a new network stack is more fun than bug finding/fixing. but not necessarily a more efficient path to a robust solution.

Amway, I do think that smoltcp is really nice, so I'm sure it will be a great solution in the long run.

sbourdeauducq commented 7 years ago

I guess the question is: if we'd spend the man year (?) that getting smoltcp to its current state has taken on tracking down and fixing the lwIP issues, where would we be?

Probably not in a great situation either: despite all those existing lwIP devices and users you mention, lwIP itself is still very buggy...

hartytp commented 7 years ago

Well, in any case, let's just get smoltcp working really well and everyone will be happy.

jordens commented 7 years ago

I share the technical criticism of smoltcp. And please keep bringing it up as such.

But how do you compare issues that we can solve and are solving (smoltcp) with those that you can't solve (w5500) or which have such a pervasive level of problems that the common reaction is and was to give up and rewrite (lwip)? And when describing your frustration level (m-labs/artiq#837) keep in mind the previous one (m-labs/artiq#456 as an example) and also keep in mind that our perspective into this as developers is different from yours as users. In the end we have to weigh lwip/c frustration against smoltcp/rust frustration. And I am convinced that this was the right decision. Despite the issues that ensued. The other arguments (wide deployment of w5500, ubiquitousness of C) seem dubious as well. To us they are not pertinent to the question of what's the right tool for the job.

The claim that the MMC is simple and should just work is an old and repeatedly disputed claim from years ago. I don't think implementing the functionality we need is simple. Current reality (the code) seems to contradict that as well and I am short on indications that it will change (current project load and schedule). That claim might be dangerous fallacy by now and it doesn't become true by repeating it. The MMC is problematic. We expected a solid and high-quality black box. We have no time or budget to reinvent or fix it. And you are right: the MMC is not the battle field (i.e. the issue discussed here).

Still, let me try again to return back to the original question. If we were to write this with a network stack and with jtag-over-ethernet, we would want to do it with rust and smoltcp. Everybody else is obviously free do whatever they want.

jordens commented 7 years ago

Summarizing the issue here: I think we agree that the uTCA crate MCH functions can be kept at bare minimum and "raw". I.e. no ethernet for the "MCH" and raw JTAG is fine.

dhslichter commented 7 years ago

@jordens I don't mean to shut down discussion by calling something a flame or a put-down, but I feel that referring in a public forum to someone else on your team's work as "unprofessional", as pointed out by @hartytp as well, should be avoided unless it is well and truly the only way to get that person to listen to your criticism. I think that @gkasprow and his team have been stellar so far in responding to suggestions, changes in designs, and are doing a lot of excellent work on a tight schedule with demanding constraints. Echoing others above, as was my initial point as well, this isn't the right time to get them all up on Rust, but I agree that it's a more robust way forward in the long haul. My comment was in no way meant to short-circuit the discussion on the technical merits, rather to point out that we should be careful to discuss the technical merits in a way that doesn't potentially provide unnecessary hurt for those who are trying to get something together.

I think your summary post just now sound good, given all the other complexities. We'll make it simple for starters and backfill as we have more time and experience.

gkasprow commented 7 years ago

@jordens @hartytp @dhslichter Guys, I know that My MMC code is awfull, It is just temporary solution, didn't want to make it nicely written because another version is on the way. Its purpose was only to init power, JTAGs and I used FREERTOS just to have Ethernet. That's all. I'm not programmer, 10 years ago I stopped developing these skills and switched to HW. But still when needed I'm able to write basic C/C++ code to test hardware parts of design. The same with VHDL. I have decent guys in my team who are much more experienced but don't want to distract them for simple things which I can do in minutes. So if you guys really want this Texas chip, I can write simple initialisation code and let you do the rest. @sbourdeauducq said that LPC17xx is also acceptable solution only RAM was not sufficient. But since we are using LPC17xx chips in other boards , they are mature with well known "features" and we have well tested HW design already I thought it might be good compromise. IMHO further discussion is pointless. I already found library for this Texas chip so can replace it immediately.

gkasprow commented 7 years ago

@sbourdeauducq @jordens what about FreeRTOS+TCP stack?

sbourdeauducq commented 7 years ago

I'm not aware of the precise application requirements for this MCU (and you should take my comment about RAM accordingly - it just sounded small for having a few TCP buffers plus the rest, considering that many TCP applications typically process a few connections simultaneously). What is the protocol on top of TCP and what features are exposed to the network? What are the algorithms for fan and temperature control? How many channels are there? I can offer some help for the TM4C/Rust/smoltcp stack; it works well in ionpak and if your firmware is simple enough (it sounds like it may be) there shouldn't be major problems with Rust. It is by the way possible to test Rust/smoltcp on the TI MCU yourself with a low-cost devkit, I can easily put together some code to provide a demo like a small HTTP server on that board.

gkasprow commented 7 years ago

Fans don't need any MCU support because thay are controlled by independent chips that needs to be initialised at the startup. So the only task for CPU would be this virtual cable. As I wrote, this uTCA box is part of another project and guys who design it don't need from CPU any additional functionality then initialisation.

gkasprow commented 7 years ago

@sbourdeauducq OK, I will use this TM4C1294 chip for this design. We will need anyway stable platform for other non-timing critical applications (described in planned HW) and since you are going to support it and integrate with ARTIQ, this is argument to use this CPU. The question is how would you connect JTAG? On the devkit , page 36, it is routed to both SPI block and general purpose IOs. No idea why. pin assignment of TM4C123xH6PMI TMS PA3 (SSI0Fss), PF1 (SSI1Tx, ) TCK PA2 (SSI0Clk), PF2 (SSI1Clk) TDI PA5 (SSI0Tx) TDO, PA4 (SSI0Rx), PD6 (U2Rx)

Maybe TMS is generated by SSI1 ?

sbourdeauducq commented 7 years ago

OK, I will use this TM4C1294 chip for this design. We will need anyway stable platform for other non-timing critical applications (described in planned HW)

Cool!

The question is how would you connect JTAG?

I'm not sure if I understand this question correctly. Are you aware that the development kit has two MCUs: U1 which is the main TM4C1294NCPDT processor on the devkit, and another one U20, a TM4C123GH6PMI, which is just used as an integrated USB/JTAG adapter?

For programming the TM4C1294 itself, it is bog-standard JTAG and no special connections are required. This is what we did on ionpak, with a standard small 10-pin ARM JTAG connector (this connector is also on the devkit): https://github.com/m-labs/ionpak/blob/master/hardware/rev1/IONPAK1_250417_1.pdf

gkasprow commented 7 years ago

@sbourdeauducq Yes, I'm taking about connection of the MCU that serves as debugger. It is connected to target CPU using duplicated IO pins, which are also connected to SPI pins. So on USB-JTAG it is SPI, on target MCU it is JTAG. Bit-bang is not speed daemon, so they probably used SPI to speed-up JTAG transmission.

gkasprow commented 7 years ago

@sbourdeauducq So I will connect JTAG as it is on TI devkit. Btw, I have a student who will implement proof of concept Xilinx cable on devkit.

sbourdeauducq commented 7 years ago

I have not programmed the SSI at all on this TI chip, but from the few resources I could find, JTAG sounds doable: https://e2e.ti.com/support/microcontrollers/tiva_arm/f/908/t/499412 and according to this document the 1294 SSI is a superset of the 123, except for the "microwire" frame format: http://www.ti.com/lit/an/spma065/spma065.pdf

gkasprow commented 7 years ago

TM4C1294 has nice SSI mode that I was missing in other chips. It is able to set SSI low during entire data frame (FSSHLDFRM bit in the SSICR1 register). With such feature full speed JTAG should be trivial.

sbourdeauducq commented 7 years ago

Good. But don't get too excited - as I mentioned before there are many silicon bugs in this chip, so things that seem "trivial" may be a bit harder in reality.

gkasprow commented 7 years ago

In worst case we can use SW-controlled TMS or even entirely SW bit-bang mode.

gkasprow commented 7 years ago

@sbourdeauducq could you pls have a look at CPU connectivity ?

sbourdeauducq commented 7 years ago

I asked Alexander to take a look at it. Here are the comments:

  1. the 25MHz crystal circuit is not the same as we are using and the reference design (18pf instead 12pf, no 2K res). Also check the crystal p/n
  2. No ESD any protection on the PHY
  3. VREFA+ (pin 9) shall be connected
  4. Why not to use another supervisor IC instead of IC15 with the open drain? Having the D5 serial to rst can have not clean reset on low voltage.
  5. Strange text layout on JP2 (21, 43...)
  6. R59, R60 10K is way to big. 1K recommended. Same for the R46-R58
  7. I am not sure if it is a good idea of sharing the flashing pins TCK-TDO (if i understand it correctly)
  8. R192, R193 to the same I2C pin (twice)?
  9. R184-R205 I2C are 10K too. Again it is too high for high speed, unless it is a battery powered better to keep it 1K (or 3.3K max)
  10. IC16 A0-A1 can be connected to GND/VCC. I think it makes sense to have this option since the I2C address space is limited to prevent conflicts (i am not sure what the slave peripheral used).
gkasprow commented 7 years ago
  1. load capacitance depends on crystal used. My crystal requries 18pf
  2. I never saw such circuit in ref designs. It's very unlikely that ESD propagates via shielded Ethernet connector with transformer. OK, I will add TVS.
  3. OK, true
  4. I used shottky diode so the low voltage is around 200mV. And I don't interfere with JTAG probe
  5. I have no influence on this. It is Altium pdf exporter. On original schematics it looks fine.
  6. Are you going to run it at 1MHz? For 100kHz edges look fine and it's quite popular value for point to point connections. 1K would be needed for multiple I2C loads.
  7. I don't share JTAG pins - they go to programming connectors only. Other JTAG pins (SSI) are routed to AMC connector
  8. OK
  9. OK, I can make them 1k
  10. I replaced them with 0R
sbourdeauducq commented 7 years ago

There is a list of recommended crystals in the MCU datasheet. It can be a bit picky.

gkasprow commented 7 years ago

True, I used NX3225GA-25.000M-STD-CRG-2