Closed marmeladapk closed 6 years ago
Is there a Kasli v1.2 planned for any time in the future?
I think we need more experience with it in the field to be able to say that.
First let's see how v1.1 operates, IMO points in the top post are not worth it right now.
Change the Si5324 (loss of lock) LOL LED to red (as it indicates an error condition) or invert it.
@jordens Oh come on! You told me to change it from red to green so there are fewer items in the BOM. :D
Anyway LOL polarity can be changed in register 22 B1 of Si5324.
Ah. Right. Then it's perfect. I forgot about both. ;)
The biggest thing I'd like to see changed on the list above is the heatsink, since the FPGA gets very hot atm. It might be worth going for a heatsink with a clip on fan.
I think we're slowly approaching point when we can think about next revision. Are there any other things we'd like to test before I start implementing changes? What's the consensus on Si5324/Si5369/Si5346?
@marmeladapk In the long run, I'm still potentially keen to implement WR on Kasli. Probably use a DAC + high-quality VCO for clock recovery. Then either use a LVPECL clock buffer (noise isn't critical here, so there are lots of options) or something like an AD9516-4 to do the fanout.
However, we're still doing a design study to make sure we get that right, so we won't have a design for that for another week or two.
Until/unless we switch to WR, I'm not fussed about any of the options being discussed. IMHO, the present clocking works well (modulo the stability/phase determinism issues with the Si5324) and none of the options presented above offer a good enough advantage in terms of cost/power/simplicity to be worth changing a working design and risking breaking things. However, if @jordens feels strongly about it, I don't object either.
Kasli must be connected to mains ground for to avoid damage to it or other equipment connected to it!
I don't think that is accurate and might even be wrong. I'd state how much potential difference we are willing and able to tolerate and what the actual ground paths in the system are. Like it is done on all measurement equipment.
@sbourdeauducq wanted to do tests with the Si5326 to guide that decision.
And I don't think a big heat sink will cut it. We have been equipping them with fans.
And I don't think a big heat sink will cut it. We have been equipping them with fans.
:+1: Something like the heat sink on the KC705 would be nice.
I don't think that is accurate and might even be wrong. I'd state how much potential difference we are willing and able to tolerate and what the actual ground paths in the system are. Like it is done on all measurement equipment.
AFAICT, connecting to PCB ground to mains ground is the most fool-proof solution, which should prevent damage in basically all cases (this is what almost all T&M equipment does), so it's an easy, safe recommendation to make -- I'm not aware of any situation where this could be dangerous/lead to damage, even if it's not often/always optimal from a noise perspective. Maybe change must to should, or even just re-word it to say that the potential difference between all grounds must be limited to safe-levels for all equipment, for example by connecting Kasli to mains ground?
Having said that, if you have a better suggestion, then feel free to make it (can you give exact text, please, including values for potential differences you want to recommend).
@jordens @sbourdeauducq I believe the answer to this is "no", but to double check: we don't think that a bigger FPGA/higher speed grade would help with anything we're doing? e.g. for large Kasli designs, we're not close to being limited by FPGA resources, right? and, the higher speed grade wouldn't ease the CPU timing issues?
After the siphaser system we introduced, I don't think the 5326 would improve anything significantly, it would just save something like one or two MMCMs in the FPGA since we can use the skew control registers instead. And if we start having it on any board, then we need to support both the 5324 and 5326 in the firmware. This family of chips appears to be exceptionally well-designed (case in point: the 5324 and 5326 are pin-compatible) and bug-free, so it's not a big issue if that has to be done, but why should we?
Note that the 5326 does not have deterministic latency - all it brings to the table is built-in functionality to increase or decrease whatever random skew it has after locking, and higher loop bandwidth. So, it doesn't help with getting deterministic phase from the external clock input to the ARTIQ outputs.
I am in favor of either:
Thanks @sbourdeauducq. In that case, here is my suggestion:
Everyone happy with that plan? If so, what's the deadline for this decision?
Change the Si5324 (loss of lock) LOL LED to red (as it indicates an error condition) or invert it.
@jordens Oh come on! You told me to change it from red to green so there are fewer items in the BOM. :D
Do we need this LED at all? Lock status is accessible from the firmware. I've never used that LED personally.
AFAICT, connecting to PCB ground to mains ground is the most fool-proof solution
Agreed that this is the most fool-proof. The default configuration should protect casual end users as well as isolate the manufacturer from liability. The grounding implementation could be made so that it's easy to modify. Then modifications which might cause harm to body or the board itself are at the risk of the end user.
Everyone happy with that plan?
Sounds fine, but integrating the WR PLL into ARTIQ doesn't sound trivial; we need to plan for the manpower and development time in the firmware and gateware (in addition to the hardware changes).
Absolutely, yes. That's an essential part of the cost / benefit analysis. But let's get a concrete proposal to discuss first...
Sounds fine, but integrating the WR PLL into ARTIQ doesn't sound trivial; we need to plan for the manpower and development time in the firmware and gateware (in addition to the hardware changes).
If we do go down the WR route, I'd still want to keep the Si5324 as well for at least the next version. Obviously, we would want to be able to use Kasli even while the WR gateware/firmware is developed and debugged.
Is it worth considering switching to a Kintex FPGA and maybe increasing the ram width (cf the DMA issues @cjbe reported) for the next version?
Speed seems to be by far the biggest complaint of ARTIQ users, and the fact that Kasli is noticeably slower than the KC705 setups we've used in the past seems like a major step in the wrong direction. I'm all for optimising gateware/firmware, but it seems silly not to start from the fastest hardware platform we reasonably can -- I'm not sure about other users (@dhslichter @dtcallcock etc), but I would gladly pay a bit more for HW if it made my setups faster.
I'm against that. Lets keep kasli at the simple end. It was well known and acknowledged that it would be slower. Wider ram will lead to board space and power, thermal issues and redesigns. You are obviously free to fund a new device with a bigger fpga though.
Lets keep kasli at the simple end.
"Simple" doesn't have to equal "slow". I'm not convinced that putting a faster FPGA on there makes it not a simple design.
It was well known and acknowledged that it would be slower.
Really? That wasn't my impression. When I discussed this via email with you and @sbourdeauducq before Kasli v1.0's design was finalised I explicitly asked about whether there would be CPU frequency issues with the ARTIQ, and was told that there wouldn't be.
In any case, I think this point is largely irrelevant. What matters is whether, having used this in the lab and knowing what we know now, we still think the current design is the right one for the users, or whether changing the FPGA would be better. Let's not get hung up on why decisions were made.
You are obviously free to fund a new device with a bigger fpga though.
Firstly: I read that to imply that you are funding work on the next Kasli revision. Is that actually true? Does your contract with WUT specify more than the standard two design rounds? If not, is this something that @marmeladapk and @gkasprow are doing on their own steam without and funding? If so, I don't see why you're bringing up funding here.
Secondly: I've worked hard to avoid hardware fragmentation in this project because I (still) believe that's the only way we're going to get a set of high-quality, well supported hardware which is stocked at good prices from a commercial vendor. If we all take the line of "this is my project, so if you don't like it then make your own version" then we're going to end up with a multiplicity of shoddy boards. I think we can be a bit more mature than that and work to find solutions that work for everyone.
Thirdly: while you may have funded the original version of Kasli, if you want someone like Creotech to stock it then they have to believe that it's what the users want. So, let's have an discussion that focusses on technical points, rather than shutting things down with "this is my project, go away".
Wider ram will lead to board space and power, thermal issues and redesigns.
You've made this kind of assertion several times in this project only to be contradicted by @marmeladapk, who is actually doing the design work and has done the simulations. If you've done a simulation or have anything concrete to back up these claims then I'd love to hear about them. But, otherwise, I'd rather hear from @gkasprow or @marmeladapk.
tl;dr: if other users don't think a bigger FPGA is worth it (maybe this is worth addressing to the ARTIQ mailing list), or if @gkasprow or @marmeladapk think that it would be too much work/cause other issues, then let's leave it as is. But, if there are simple changes that can make Kasli work better for the users then we should consider them.
After all, it's not like the current FPGA on Kasli isn't causing problems right now, and that makes me concerned that in the long run it's not a very good choice.
It was well known and acknowledged that it would be slower.
Again, I'd love to hear from one of the other groups who are actually using ARTIQ to run experiments with (e.g. @dtcallcock @dhslichter) but my feeling is that the current slowness of ARTIQ makes it a massive pain in the neck for most use cases. Anything that makes it even slower is of very limited interest as far as I'm concerned.
To be a bit more concrete here, my concerns are things like: if we're struggling to make ARTIQ meet timing on Kasli as it is, what will happen when we want to add features like hard floating-point maths? Will we just have to accept that they aren't available on Kasli because we put a slow FPGA on it?
It was well known and acknowledged that it would be slower.
Really? That wasn't my impression. When I discussed this via email with you and @sbourdeauducq before Kasli v1.0's design was finalised I explicitly asked about whether there would be CPU frequency issues with the ARTIQ, and was told that there wouldn't be.
I guess @jordens is talking about the RAM, which is obviously slower than on KC705 (16-bit vs. 64-bit data bus). How strongly Vivado insists on making mor1kx systems slow on Artix-7, on the other hand, is a bit of a surprise. Switching to 7K70T might be OK (it's not much more expensive), if it weren't for the major PCB design change, plus another round of transceiver yak-shaving to make DRTIO and Ethernet work again (among their many problems, transceivers are not compatible between FPGA families and each comes with its own set of idiosyncrasies and obscure bugs).
How strongly Vivado insists on making mor1kx systems slow on Artix-7, on the other hand, is a bit of a surprise.
Part of the reason it's a surprise is because uniprocessor systems (e.g. the DRTIO satellite, and other MiSoC ports to Artix-7 boards) meet timing; the problems appear with the ARTIQ dual-CPU design for some reason.
I guess @jordens is talking about the RAM, which is obviously slower than on KC705 (16-bit vs. 64-bit data bus).
Yes, that the RAM on Kasli is obviously slower than the KC705. However, I'm not sure that the effects that has on ARTIQ DMA were obvious or anticipated (the RAM bandwidth is still pretty huge).
However, that can probably be sorted with a more efficient RAM controller, so it may be that no HW changes are needed here. Although, it's still not clear to me that the cost of adding a wider RAM bus to Kasli is actually that high, so it might still be worth considering.
How strongly Vivado insists on making mor1kx systems slow on Artix-7, on the other hand, is a bit of a surprise.
Yes, this concerns me much more than the RAM.
if it weren't for the major PCB design change
IIRC, there will need to be some substantial re-routing to add a bigger heatsink, so now is the time to consider that. @marmeladapk @gkasprow what's your feeling about this? How hard would that be to do?
plus another round of transceiver yak-shaving to make DRTIO and Ethernet work again (among their many problems, transceivers are not compatible between FPGA families and they each come with their own set of idiosyncrasies and obscure bugs).
ACK. That's a reasonable point. However, what do you think is going to be more work in the long run: fixing the transceivers once or fighting against slow silicon on Kasli for everything we do?
Part of the reason it's a surprise is because uniprocessor systems (e.g. the DRTIO satellite, and other MiSoC ports to Artix-7 boards) meet timing; the problems appear with the ARTIQ dual-CPU design for some reason.
Yes, that is a bit concerning. And, again, it doesn't give me much confidence that issues to do with Artix FPGAs being slow won't be a recurring theme as we develop ARTIQ further, e.g. by adding hard FP.
ACK. That's a reasonable point. However, what do you think is going to be more work in the long run: fixing the transceivers once or fighting against slow silicon on Kasli for everything we do?
It's not "everything"; as I posted in https://github.com/m-labs/artiq/issues/891 the LM32 processor has a lot less timing issues. Also, even with mor1kx, the magnitude of the timing failure is small: with the numbers that Robert initially reported, it would run at 122.9MHz instead of the target 125MHz, a 1.7% slowdown.
Also, even with mor1kx, the magnitude of the timing failure is small: with the numbers that Robert initially reported, it would run at 122.9MHz instead of the target 125MHz, a 1.7% slowdown.
Yes, although I got the impression that there was some build to build variation in that.
It's not "everything"; as I posted in m-labs/artiq#891 the LM32 processor has a lot less timing issues.
You're right "everything" is an exaggeration. Lot's of things won't be affected by having a slower FPGA. But, it is hurting us at the moment, and it seems likely that having a CPU that only just meets timing has the potential to cause more issues in the future.
Taking a step back for a moment: I think we all agree that ARTIQ is currently too slow, and that it needs to be speeded up in the future. We know the Artix FPGA on Kasli is slower than comparably priced Kintex FPGAs, and that this can cause issues. Given that, are we sure we don't want to consider using a faster FPGA? Even if we do switch to the LM32 and even if this does fix the timing issues we're currently seeing, are you really sure that the slower FPGA won't cause other problems in the future?
Of course, if @gkasprow or @marmeladapk say it's too much work to consider for the next version of Kasli, then it may not be an option.
Does your contract with WUT specify more than the standard two design rounds? If not, is this something that @marmeladapk and @gkasprow are doing on their own steam without and funding?
I'm not aware of the state of the funding. However week ago I thought that design was frozen and v1.2 was only cosmetic and QoL changes.
Wider ram will lead to board space and power, thermal issues and redesigns.
@hartytp, @jordens is right on this one. While I wouldn't worry about thermal and power issues (unless we're speaking about FPGA itself, we've got plenty of power left on 1,5 V rail, see schematics) redesign would be needed here. I feel, that current configuration is optimal and we don't have any room to fiddle with this design.
First, we would have to use another bank on FPGA for additional 16 bits. We can't share it with EEM banks since they use different voltages. So we have to get rid of at least 3 extensions (if we want to minimise work that would be eem0 to 2). This is the easiest option to implement. If we don't want to cut any corners then other options include expander and level shifter for all SFP control signals that could then share bank with SDRAM. That would require bottom half of FPGA (and eem0 to 7 and 10 to 11) to be rerouted.
We could also switch to bigger package (and to Artix 200T since we need an additional bank). This may or may not require to switch to 8 layers. I can't tell because on the one hand I need to escape with additional 16 length matched signals and on the other hand FPGA side is bigger. Obviously bigger package requires rerouting everything under and in the vicinity of the FPGA (+ power planes). And changing stackup will probably require changes to all impedance controlled lines.
if there are simple changes that can make Kasli work better for the users then we should consider them
As I said, I think we don't have any easy ways to improve this design (apart from changing 100T to 200T in the same package, but more on that later).
TL;DR: Wider RAM is hard.
if we're struggling to make ARTIQ meet timing on Kasli
Vivado insists on making mor1kx systems slow on Artix-7
Is this something that more resources could help? Or are we not using over 60-70% of them (can't check right now, I can't connect to our computers at uni)? Artix 200T is available in the same package. Xilinx says that it's pin compatible, it may only have other decoupling requirements.
Also, stupid question: have we tried different synthesis and implementation strategies? Xilinx recommends trying all of them and then choosing one that works best (seriously) if design barely doesn't meet timing. 1-2% could be gained there.
ARTIQ is currently too slow
Is it only slow on Kasli or also on Sayma and KC705?
Is it worth considering switching to a Kintex FPGA and maybe increasing the ram width
Artix FPGA on Kasli is slower than comparably priced Kintex FPGAs (...) are we sure we don't want to consider using a faster FPGA?
I feel that this would be the change worthy of 2.0 version number. K70T has less resources that A100T so I don't feel that would be the optimal change. Only other FPGA available in 484 package is K160T. But we have to remember, that Xilinx FPGAs are not pin-compatible between families. So that would require rerouting everything under and near FPGA. Also some LVDS lines would have to use HP banks, so we would use 1,8 V LVDS. This should not be a problem for devices (probably, @gkasprow?), just noting. But we could then use HP bank for SDRAM. Not sure if this would help, @jordens and @sbourdeauducq should chime in here.
TL;DR Switching to Kintex-7 is also hard.
Overall my feelings are, that we're moving away from the idea of cheap and simpler controller that Kasli was supposed to be. Instead I feel like we're stepping closer to inventing MTCA from scratch, only with ribbon cables this time. And while I agree with @hartytp that ARTIQ on Kasli is too slow I wouldn't jump to conclusion that changing hardware right now is the solution. It looks like knee-jerk reaction TBH.
Perhaps giving more time (and funding?) to m-labs to optimize ARTIQ and underlying software could help? It would also translate to gains on all other boards now and in the future. If @sbourdeauducq says, that he didn't anticipate processor to be so slow on Artix and we didn't find a clear bottleneck who says that it won't repeat on small Kintex?
OFC if we find the bottleneck and it's for example RAM then let's change that. But I've outlined why these aren't simple changes.
Also, stupid question: have we tried different synthesis and implementation strategies? Xilinx recommends trying all of them and then choosing one that works best (seriously) if design barely doesn't meet timing. 1-2% could be gained there.
Yeah, I've been looking into this. I can make many mor1kx options meet timing on many variants (opticlock, sysu, master etc.) by using various Vivado commands, but what works on one variant doesn't often work on another and some combination of commands have very long runtime.
Trying a pin-compatible 200T FPGA is a good idea, and easy to change in the gateware - let me see if that improves timing.
Thanks for the detailed explanation @marmeladapk! If wider RAM/a faster FPGA isn't an option for the next revision then that makes the decision for us, at least for the time being.
I guess that so long as we press ahead with Metlino, we should have a genuine high-performance option in case Kasli proves problematic.
Is it only slow on Kasli or also on Sayma and KC705?
I can't comment on Sayma yet, but ARTIQ is definitely slow on the KC705. However, we're seeing new issues with Kasli that we didn't see on the KC705 and that concerns me.
Perhaps giving more time (and funding?) to m-labs to optimize ARTIQ and underlying software could help?
I think we all agree that this is essential (I believe there is already an effort underway to do this). However, ARTIQ is quite a long way from the point where speed wouldn't be an issue, so it seems like a step in the wrong direction to move to a slower hardware platform. And, again, the concern is that even if we do eek out a few extra percent to get the CPU to meet timing, what will happen when we try to add future improvements like hard FP? Will the slower FPGA make that impossible, or at least prohibitively difficult?
@sbourdeauducq Yeah, I've been looking into this. I can make many mor1kx options meet timing on many variants (opticlock, sysu, master etc.) by using various Vivado commands, but what works on one variant doesn't often work on another and some combination of commands have very long runtime.
That's exactly what I mean. Playing around trying to squeeze the last few percent of performance out of a slow FPGA to meet timing does not sound like a robust strategy that is going to work well in the long run.
Will the slower FPGA make that impossible, or at least prohibitively difficult?
For the FPU, no. Floating point operations already need several cycles to get done, even in a less-slow FPGA. With a slower FPGA we can most probably just add a few more pipeline registers to solve any timing problem. If you are worried about FP on Artix-7, try compiling those cores: https://github.com/m-labs/milkymist/tree/master/cores/pfpu/rtl https://github.com/nakengelhardt/fpgagraphlib/blob/master/src/faddsub.py https://github.com/nakengelhardt/fpgagraphlib/blob/master/src/fmul.py
The 200T doesn't really help, it meets timing with opticlock and fails on master and sysu.
On the other hand, moving to the -3 speed grade (on 100T) does help; everything meets timing. But we have to support the existing boards. Also @jordens, IIRC you said before that it didn't help to use a faster speed grade, can you elaborate? My test was with a modified mor1kx configuration (that still has about the same performance as the KC705):
--- a/misoc/cores/mor1kx/core.py
+++ b/misoc/cores/mor1kx/core.py
@@ -25,6 +25,7 @@ class MOR1KX(Module):
OPTION_DCACHE_WAYS=1,
OPTION_DCACHE_LIMIT_WIDTH=31,
FEATURE_TIMER="NONE",
+ FEATURE_PIC="NONE",
OPTION_PIC_TRIGGER="LEVEL",
FEATURE_SYSCALL="NONE",
FEATURE_TRAP="NONE",
@@ -32,10 +33,11 @@ class MOR1KX(Module):
FEATURE_OVERFLOW="NONE",
FEATURE_ADDC="ENABLED",
FEATURE_CMOV="ENABLED",
- FEATURE_FFL1="ENABLED",
+ FEATURE_FFL1="REGISTERED",
+ FEATURE_ATOMIC="NONE",
OPTION_CPU0="CAPPUCCINO",
- IBUS_WB_TYPE="B3_REGISTERED_FEEDBACK",
- DBUS_WB_TYPE="B3_REGISTERED_FEEDBACK",
+ IBUS_WB_TYPE="CLASSIC",
+ DBUS_WB_TYPE="CLASSIC",
)
defaults.update(kwargs)
parameters = {"p_{}".format(k): v for k, v in defaults.items()}
Also meets timing with the original mor1kx config ...
Targeting a -3 with the pristine stack did not make a difference back when I tried it.
This sounds like a good reason to move forward with Metlino. Then Sinara has both a heavy-weight and light-weight ARTIQ core devices.
I have reduced the clock frequency from 125MHz to 121MHz on Kasli. It now meets timing.
Moving forward, I suggest having a -3 speed grade on the next Kasli version, and a system clock frequency that depends on the hardware revision. We should be able to get back at least 125MHz. We are already doing different bitstreams for different hardware revisions, and this is straightforward to do in the code.
@hartytp Conversely, if you are willing to do the testing and yak-shaving, the same technique can be applied to crank up the system clock frequency a little on KC705 and Sayma. Things to watch out for (in addition to Vivado timing reports) are:
Conversely, if you are willing to do the testing and yak-shaving, the same technique can be applied to crank up the system clock frequency
The yak-shaving should include figuring out problems such as why Vivado meets timing when run manually but not via the buildbot, because it is giving me this crap right now.
Let's not bother with WR for Kasli. We can consider it for Sayma/Metlino when the time comes.
@hartytp I like this idea (not bothering with WR for Kasli). @jbqubit agreed that Metlino would be good to implement when we can.
Metlino can be operated stand-alone. We can even make some mechanical fixture to fit it into the 3U rack with SATA-SFP converter mounted at the front panel. Of course this would make sense only for 19" rack.
@sbourdeauducq :
I am in favor of either:
- keeping the 5324 (the engineering costs associated with the change and the maintainance of multiple clock chip variants in the firmware do not seem to be worth using a 5326, 5369 or 5346).
- designing a high-performance PLL a la White Rabbit.
Since we decided (I believe) to not bother with WR this leaves us with Si5324 (bc other Si chips also don't have deterministic latency).
@hartytp :
We agree on a cut-off date for changes to the design for the next version of Kasli
Shall we set some date?
Top of my wish list is sorting out the grounding for Kasli. I think that at a minimum, we should ensure that the PCB ground connects to the front panel, and that the front panel is designed to ground to a chassis. Otherwise, we're in for a world of broken devices when people power them from switch mode supplies that can float to a couple of hundred volts.
Relying on USB/RF connectors for grounding is not a safe or sensible strategy.
We should also add a note in the docs about this.
'grounding via the chassis' might help you but it it's a NOP in many cases if they subrack is standing on a usually isolated table or on top of some other insulating device or the rack is not grounded. It's risky and not a solution. We should provide a ground connection or revise the power supply and connector design choice.