sinara-hw / Kasli

Kasli is a powerful FPGA carrier, capable of controlling 12 Eurocard extension modules.
Other
16 stars 1 forks source link

Kasli v1.2 wishlist #4

Closed marmeladapk closed 6 years ago

marmeladapk commented 6 years ago

From @marmeladapk on January 31, 2018 9:53

Copied from original issue: sinara-hw/sinara#499

marmeladapk commented 6 years ago

From @hartytp on January 31, 2018 14:2

Is there a Kasli v1.2 planned for any time in the future?

marmeladapk commented 6 years ago

From @jordens on January 31, 2018 14:38

I think we need more experience with it in the field to be able to say that.

marmeladapk commented 6 years ago

First let's see how v1.1 operates, IMO points in the top post are not worth it right now.

marmeladapk commented 6 years ago

Change the Si5324 (loss of lock) LOL LED to red (as it indicates an error condition) or invert it.

@jordens Oh come on! You told me to change it from red to green so there are fewer items in the BOM. :D

Anyway LOL polarity can be changed in register 22 B1 of Si5324.

marmeladapk commented 6 years ago

From @jordens on February 3, 2018 7:57

Ah. Right. Then it's perfect. I forgot about both. ;)

marmeladapk commented 6 years ago

From @hartytp on March 28, 2018 21:45

The biggest thing I'd like to see changed on the list above is the heatsink, since the FPGA gets very hot atm. It might be worth going for a heatsink with a clip on fan.

marmeladapk commented 6 years ago

From @hartytp on March 28, 2018 23:31

marmeladapk commented 6 years ago

I think we're slowly approaching point when we can think about next revision. Are there any other things we'd like to test before I start implementing changes? What's the consensus on Si5324/Si5369/Si5346?

marmeladapk commented 6 years ago

From @hartytp on March 29, 2018 9:6

@marmeladapk In the long run, I'm still potentially keen to implement WR on Kasli. Probably use a DAC + high-quality VCO for clock recovery. Then either use a LVPECL clock buffer (noise isn't critical here, so there are lots of options) or something like an AD9516-4 to do the fanout.

However, we're still doing a design study to make sure we get that right, so we won't have a design for that for another week or two.

Until/unless we switch to WR, I'm not fussed about any of the options being discussed. IMHO, the present clocking works well (modulo the stability/phase determinism issues with the Si5324) and none of the options presented above offer a good enough advantage in terms of cost/power/simplicity to be worth changing a working design and risking breaking things. However, if @jordens feels strongly about it, I don't object either.

marmeladapk commented 6 years ago

From @jordens on March 29, 2018 9:21

Kasli must be connected to mains ground for to avoid damage to it or other equipment connected to it!

I don't think that is accurate and might even be wrong. I'd state how much potential difference we are willing and able to tolerate and what the actual ground paths in the system are. Like it is done on all measurement equipment.

@sbourdeauducq wanted to do tests with the Si5326 to guide that decision.

And I don't think a big heat sink will cut it. We have been equipping them with fans.

marmeladapk commented 6 years ago

From @hartytp on March 29, 2018 9:30

And I don't think a big heat sink will cut it. We have been equipping them with fans.

:+1: Something like the heat sink on the KC705 would be nice.

I don't think that is accurate and might even be wrong. I'd state how much potential difference we are willing and able to tolerate and what the actual ground paths in the system are. Like it is done on all measurement equipment.

AFAICT, connecting to PCB ground to mains ground is the most fool-proof solution, which should prevent damage in basically all cases (this is what almost all T&M equipment does), so it's an easy, safe recommendation to make -- I'm not aware of any situation where this could be dangerous/lead to damage, even if it's not often/always optimal from a noise perspective. Maybe change must to should, or even just re-word it to say that the potential difference between all grounds must be limited to safe-levels for all equipment, for example by connecting Kasli to mains ground?

Having said that, if you have a better suggestion, then feel free to make it (can you give exact text, please, including values for potential differences you want to recommend).

marmeladapk commented 6 years ago

From @hartytp on March 29, 2018 9:33

@jordens @sbourdeauducq I believe the answer to this is "no", but to double check: we don't think that a bigger FPGA/higher speed grade would help with anything we're doing? e.g. for large Kasli designs, we're not close to being limited by FPGA resources, right? and, the higher speed grade wouldn't ease the CPU timing issues?

marmeladapk commented 6 years ago

From @sbourdeauducq on March 29, 2018 10:28

After the siphaser system we introduced, I don't think the 5326 would improve anything significantly, it would just save something like one or two MMCMs in the FPGA since we can use the skew control registers instead. And if we start having it on any board, then we need to support both the 5324 and 5326 in the firmware. This family of chips appears to be exceptionally well-designed (case in point: the 5324 and 5326 are pin-compatible) and bug-free, so it's not a big issue if that has to be done, but why should we?

marmeladapk commented 6 years ago

From @sbourdeauducq on March 29, 2018 10:38

Note that the 5326 does not have deterministic latency - all it brings to the table is built-in functionality to increase or decrease whatever random skew it has after locking, and higher loop bandwidth. So, it doesn't help with getting deterministic phase from the external clock input to the ARTIQ outputs.

I am in favor of either:

marmeladapk commented 6 years ago

From @hartytp on March 29, 2018 11:6

Thanks @sbourdeauducq. In that case, here is my suggestion:

Everyone happy with that plan? If so, what's the deadline for this decision?

marmeladapk commented 6 years ago

From @sbourdeauducq on March 29, 2018 11:18

Change the Si5324 (loss of lock) LOL LED to red (as it indicates an error condition) or invert it.

@jordens Oh come on! You told me to change it from red to green so there are fewer items in the BOM. :D

Do we need this LED at all? Lock status is accessible from the firmware. I've never used that LED personally.

marmeladapk commented 6 years ago

From @jbqubit on March 29, 2018 12:55

AFAICT, connecting to PCB ground to mains ground is the most fool-proof solution

Agreed that this is the most fool-proof. The default configuration should protect casual end users as well as isolate the manufacturer from liability. The grounding implementation could be made so that it's easy to modify. Then modifications which might cause harm to body or the board itself are at the risk of the end user.

marmeladapk commented 6 years ago

From @sbourdeauducq on March 29, 2018 16:49

Everyone happy with that plan?

Sounds fine, but integrating the WR PLL into ARTIQ doesn't sound trivial; we need to plan for the manpower and development time in the firmware and gateware (in addition to the hardware changes).

marmeladapk commented 6 years ago

From @hartytp on March 29, 2018 17:14

Absolutely, yes. That's an essential part of the cost / benefit analysis. But let's get a concrete proposal to discuss first...

marmeladapk commented 6 years ago

From @hartytp on April 1, 2018 8:7

Sounds fine, but integrating the WR PLL into ARTIQ doesn't sound trivial; we need to plan for the manpower and development time in the firmware and gateware (in addition to the hardware changes).

If we do go down the WR route, I'd still want to keep the Si5324 as well for at least the next version. Obviously, we would want to be able to use Kasli even while the WR gateware/firmware is developed and debugged.

marmeladapk commented 6 years ago

From @hartytp on April 1, 2018 8:19

Is it worth considering switching to a Kintex FPGA and maybe increasing the ram width (cf the DMA issues @cjbe reported) for the next version?

Speed seems to be by far the biggest complaint of ARTIQ users, and the fact that Kasli is noticeably slower than the KC705 setups we've used in the past seems like a major step in the wrong direction. I'm all for optimising gateware/firmware, but it seems silly not to start from the fastest hardware platform we reasonably can -- I'm not sure about other users (@dhslichter @dtcallcock etc), but I would gladly pay a bit more for HW if it made my setups faster.

marmeladapk commented 6 years ago

From @jordens on April 1, 2018 9:0

I'm against that. Lets keep kasli at the simple end. It was well known and acknowledged that it would be slower. Wider ram will lead to board space and power, thermal issues and redesigns. You are obviously free to fund a new device with a bigger fpga though.

marmeladapk commented 6 years ago

From @hartytp on April 1, 2018 14:9

Lets keep kasli at the simple end.

"Simple" doesn't have to equal "slow". I'm not convinced that putting a faster FPGA on there makes it not a simple design.

It was well known and acknowledged that it would be slower.

Really? That wasn't my impression. When I discussed this via email with you and @sbourdeauducq before Kasli v1.0's design was finalised I explicitly asked about whether there would be CPU frequency issues with the ARTIQ, and was told that there wouldn't be.

In any case, I think this point is largely irrelevant. What matters is whether, having used this in the lab and knowing what we know now, we still think the current design is the right one for the users, or whether changing the FPGA would be better. Let's not get hung up on why decisions were made.

You are obviously free to fund a new device with a bigger fpga though.

Firstly: I read that to imply that you are funding work on the next Kasli revision. Is that actually true? Does your contract with WUT specify more than the standard two design rounds? If not, is this something that @marmeladapk and @gkasprow are doing on their own steam without and funding? If so, I don't see why you're bringing up funding here.

Secondly: I've worked hard to avoid hardware fragmentation in this project because I (still) believe that's the only way we're going to get a set of high-quality, well supported hardware which is stocked at good prices from a commercial vendor. If we all take the line of "this is my project, so if you don't like it then make your own version" then we're going to end up with a multiplicity of shoddy boards. I think we can be a bit more mature than that and work to find solutions that work for everyone.

Thirdly: while you may have funded the original version of Kasli, if you want someone like Creotech to stock it then they have to believe that it's what the users want. So, let's have an discussion that focusses on technical points, rather than shutting things down with "this is my project, go away".

Wider ram will lead to board space and power, thermal issues and redesigns.

You've made this kind of assertion several times in this project only to be contradicted by @marmeladapk, who is actually doing the design work and has done the simulations. If you've done a simulation or have anything concrete to back up these claims then I'd love to hear about them. But, otherwise, I'd rather hear from @gkasprow or @marmeladapk.


tl;dr: if other users don't think a bigger FPGA is worth it (maybe this is worth addressing to the ARTIQ mailing list), or if @gkasprow or @marmeladapk think that it would be too much work/cause other issues, then let's leave it as is. But, if there are simple changes that can make Kasli work better for the users then we should consider them.

After all, it's not like the current FPGA on Kasli isn't causing problems right now, and that makes me concerned that in the long run it's not a very good choice.

marmeladapk commented 6 years ago

From @hartytp on April 1, 2018 14:12

It was well known and acknowledged that it would be slower.

Again, I'd love to hear from one of the other groups who are actually using ARTIQ to run experiments with (e.g. @dtcallcock @dhslichter) but my feeling is that the current slowness of ARTIQ makes it a massive pain in the neck for most use cases. Anything that makes it even slower is of very limited interest as far as I'm concerned.

marmeladapk commented 6 years ago

From @hartytp on April 1, 2018 14:40

To be a bit more concrete here, my concerns are things like: if we're struggling to make ARTIQ meet timing on Kasli as it is, what will happen when we want to add features like hard floating-point maths? Will we just have to accept that they aren't available on Kasli because we put a slow FPGA on it?

marmeladapk commented 6 years ago

From @sbourdeauducq on April 1, 2018 14:57

It was well known and acknowledged that it would be slower.

Really? That wasn't my impression. When I discussed this via email with you and @sbourdeauducq before Kasli v1.0's design was finalised I explicitly asked about whether there would be CPU frequency issues with the ARTIQ, and was told that there wouldn't be.

I guess @jordens is talking about the RAM, which is obviously slower than on KC705 (16-bit vs. 64-bit data bus). How strongly Vivado insists on making mor1kx systems slow on Artix-7, on the other hand, is a bit of a surprise. Switching to 7K70T might be OK (it's not much more expensive), if it weren't for the major PCB design change, plus another round of transceiver yak-shaving to make DRTIO and Ethernet work again (among their many problems, transceivers are not compatible between FPGA families and each comes with its own set of idiosyncrasies and obscure bugs).

marmeladapk commented 6 years ago

From @sbourdeauducq on April 1, 2018 15:28

How strongly Vivado insists on making mor1kx systems slow on Artix-7, on the other hand, is a bit of a surprise.

Part of the reason it's a surprise is because uniprocessor systems (e.g. the DRTIO satellite, and other MiSoC ports to Artix-7 boards) meet timing; the problems appear with the ARTIQ dual-CPU design for some reason.

marmeladapk commented 6 years ago

From @hartytp on April 1, 2018 17:28

I guess @jordens is talking about the RAM, which is obviously slower than on KC705 (16-bit vs. 64-bit data bus).

Yes, that the RAM on Kasli is obviously slower than the KC705. However, I'm not sure that the effects that has on ARTIQ DMA were obvious or anticipated (the RAM bandwidth is still pretty huge).

However, that can probably be sorted with a more efficient RAM controller, so it may be that no HW changes are needed here. Although, it's still not clear to me that the cost of adding a wider RAM bus to Kasli is actually that high, so it might still be worth considering.

marmeladapk commented 6 years ago

From @hartytp on April 1, 2018 17:32

How strongly Vivado insists on making mor1kx systems slow on Artix-7, on the other hand, is a bit of a surprise.

Yes, this concerns me much more than the RAM.

if it weren't for the major PCB design change

IIRC, there will need to be some substantial re-routing to add a bigger heatsink, so now is the time to consider that. @marmeladapk @gkasprow what's your feeling about this? How hard would that be to do?

plus another round of transceiver yak-shaving to make DRTIO and Ethernet work again (among their many problems, transceivers are not compatible between FPGA families and they each come with their own set of idiosyncrasies and obscure bugs).

ACK. That's a reasonable point. However, what do you think is going to be more work in the long run: fixing the transceivers once or fighting against slow silicon on Kasli for everything we do?

marmeladapk commented 6 years ago

From @hartytp on April 1, 2018 17:34

Part of the reason it's a surprise is because uniprocessor systems (e.g. the DRTIO satellite, and other MiSoC ports to Artix-7 boards) meet timing; the problems appear with the ARTIQ dual-CPU design for some reason.

Yes, that is a bit concerning. And, again, it doesn't give me much confidence that issues to do with Artix FPGAs being slow won't be a recurring theme as we develop ARTIQ further, e.g. by adding hard FP.

marmeladapk commented 6 years ago

From @sbourdeauducq on April 2, 2018 2:45

ACK. That's a reasonable point. However, what do you think is going to be more work in the long run: fixing the transceivers once or fighting against slow silicon on Kasli for everything we do?

It's not "everything"; as I posted in https://github.com/m-labs/artiq/issues/891 the LM32 processor has a lot less timing issues. Also, even with mor1kx, the magnitude of the timing failure is small: with the numbers that Robert initially reported, it would run at 122.9MHz instead of the target 125MHz, a 1.7% slowdown.

marmeladapk commented 6 years ago

From @hartytp on April 2, 2018 7:35

Also, even with mor1kx, the magnitude of the timing failure is small: with the numbers that Robert initially reported, it would run at 122.9MHz instead of the target 125MHz, a 1.7% slowdown.

Yes, although I got the impression that there was some build to build variation in that.

It's not "everything"; as I posted in m-labs/artiq#891 the LM32 processor has a lot less timing issues.

You're right "everything" is an exaggeration. Lot's of things won't be affected by having a slower FPGA. But, it is hurting us at the moment, and it seems likely that having a CPU that only just meets timing has the potential to cause more issues in the future.

Taking a step back for a moment: I think we all agree that ARTIQ is currently too slow, and that it needs to be speeded up in the future. We know the Artix FPGA on Kasli is slower than comparably priced Kintex FPGAs, and that this can cause issues. Given that, are we sure we don't want to consider using a faster FPGA? Even if we do switch to the LM32 and even if this does fix the timing issues we're currently seeing, are you really sure that the slower FPGA won't cause other problems in the future?

marmeladapk commented 6 years ago

From @hartytp on April 2, 2018 7:36

Of course, if @gkasprow or @marmeladapk say it's too much work to consider for the next version of Kasli, then it may not be an option.

marmeladapk commented 6 years ago

Does your contract with WUT specify more than the standard two design rounds? If not, is this something that @marmeladapk and @gkasprow are doing on their own steam without and funding?

I'm not aware of the state of the funding. However week ago I thought that design was frozen and v1.2 was only cosmetic and QoL changes.


Wider ram will lead to board space and power, thermal issues and redesigns.

@hartytp, @jordens is right on this one. While I wouldn't worry about thermal and power issues (unless we're speaking about FPGA itself, we've got plenty of power left on 1,5 V rail, see schematics) redesign would be needed here. I feel, that current configuration is optimal and we don't have any room to fiddle with this design.

First, we would have to use another bank on FPGA for additional 16 bits. We can't share it with EEM banks since they use different voltages. So we have to get rid of at least 3 extensions (if we want to minimise work that would be eem0 to 2). This is the easiest option to implement. If we don't want to cut any corners then other options include expander and level shifter for all SFP control signals that could then share bank with SDRAM. That would require bottom half of FPGA (and eem0 to 7 and 10 to 11) to be rerouted.

We could also switch to bigger package (and to Artix 200T since we need an additional bank). This may or may not require to switch to 8 layers. I can't tell because on the one hand I need to escape with additional 16 length matched signals and on the other hand FPGA side is bigger. Obviously bigger package requires rerouting everything under and in the vicinity of the FPGA (+ power planes). And changing stackup will probably require changes to all impedance controlled lines.

if there are simple changes that can make Kasli work better for the users then we should consider them

As I said, I think we don't have any easy ways to improve this design (apart from changing 100T to 200T in the same package, but more on that later).

TL;DR: Wider RAM is hard.


if we're struggling to make ARTIQ meet timing on Kasli

Vivado insists on making mor1kx systems slow on Artix-7

Is this something that more resources could help? Or are we not using over 60-70% of them (can't check right now, I can't connect to our computers at uni)? Artix 200T is available in the same package. Xilinx says that it's pin compatible, it may only have other decoupling requirements.

Also, stupid question: have we tried different synthesis and implementation strategies? Xilinx recommends trying all of them and then choosing one that works best (seriously) if design barely doesn't meet timing. 1-2% could be gained there.

ARTIQ is currently too slow

Is it only slow on Kasli or also on Sayma and KC705?

Is it worth considering switching to a Kintex FPGA and maybe increasing the ram width

Artix FPGA on Kasli is slower than comparably priced Kintex FPGAs (...) are we sure we don't want to consider using a faster FPGA?

I feel that this would be the change worthy of 2.0 version number. K70T has less resources that A100T so I don't feel that would be the optimal change. Only other FPGA available in 484 package is K160T. But we have to remember, that Xilinx FPGAs are not pin-compatible between families. So that would require rerouting everything under and near FPGA. Also some LVDS lines would have to use HP banks, so we would use 1,8 V LVDS. This should not be a problem for devices (probably, @gkasprow?), just noting. But we could then use HP bank for SDRAM. Not sure if this would help, @jordens and @sbourdeauducq should chime in here.

TL;DR Switching to Kintex-7 is also hard.


Overall my feelings are, that we're moving away from the idea of cheap and simpler controller that Kasli was supposed to be. Instead I feel like we're stepping closer to inventing MTCA from scratch, only with ribbon cables this time. And while I agree with @hartytp that ARTIQ on Kasli is too slow I wouldn't jump to conclusion that changing hardware right now is the solution. It looks like knee-jerk reaction TBH.

Perhaps giving more time (and funding?) to m-labs to optimize ARTIQ and underlying software could help? It would also translate to gains on all other boards now and in the future. If @sbourdeauducq says, that he didn't anticipate processor to be so slow on Artix and we didn't find a clear bottleneck who says that it won't repeat on small Kintex?

OFC if we find the bottleneck and it's for example RAM then let's change that. But I've outlined why these aren't simple changes.

marmeladapk commented 6 years ago

From @sbourdeauducq on April 2, 2018 12:54

Also, stupid question: have we tried different synthesis and implementation strategies? Xilinx recommends trying all of them and then choosing one that works best (seriously) if design barely doesn't meet timing. 1-2% could be gained there.

Yeah, I've been looking into this. I can make many mor1kx options meet timing on many variants (opticlock, sysu, master etc.) by using various Vivado commands, but what works on one variant doesn't often work on another and some combination of commands have very long runtime.

Trying a pin-compatible 200T FPGA is a good idea, and easy to change in the gateware - let me see if that improves timing.

marmeladapk commented 6 years ago

Error when importing.