progranism / Open-Source-FPGA-Bitcoin-Miner

A completely open source implementation of a Bitcoin Miner for Altera and Xilinx FPGAs. This project hopes to promote the free and open development of FPGA based mining solutions and secure the future of the Bitcoin project as a whole. A binary release is currently available for the Terasic DE2-115 Development Board, and there are compile-able projects for numerous boards.
GNU General Public License v3.0
1.28k stars 551 forks source link

Unable to communicate with FPGA firmware on new device port #26

Closed penguin359 closed 6 years ago

penguin359 commented 6 years ago

I'm trying to port this firmware to a slightly different FPGA development board. It's the DE2, the predecessor to the DE2-115 already supported. It has the Cyclone II EP2C35F672C6 FPGA. Starting from the existing DE2_115 unoptimized version, I have the project building successfully for this board and creating the bitstream. Running the mine.tcl script, I can see it find both the USB Blaster JTAG and the FPGA behind it:

Looking for and preparing FPGAs...

0) USB-Blaster [USB-0] @1: EP2C35 (0x020B40DD)

Selecting that board, it rightly claims that it does not having the mining firmware installed. I can program the sof bitstream through the USB Blaster from Quartus and it appears successful, but after that, the JTAG interface is no longer functional. Running mine.tcl finds the blaster, but does not see the FPGA behind it anymore. Attempting to re-flash the firmware also no longer works and I have to power cycle the board to continue on.

My guess is something is not set up with the clock or PLL configuration. My board runs at 50 MHz and it looks like the osc_clk is correctly mapped to PIN_N2 in the project. Any pointers on how to diagnose the issue would be appreciated.

fpgaminer commented 6 years ago

It's been awhile since I touched this codebase, but I'll see if I can help.

Hmmm, the clock shouldn't affect whether or not the FPGA is detected. The mine.tcl script looks for a mining FPGA by checking for the relevant In-System Sources and Probes instances. Those don't depend on the osc_clk or anything.

After programming the bitstream, can you try opening the In-System Sources and Probe editor in Quartus? You should be able to select the USB Blaster in there and the FPGA and then see a list of sources and probes available on the running FPGA.

If it won't let you access the FPGA or won't list any sources or probes on the running FPGA then there is definitely something going wrong with JTAG.

If JTAG isn't working right it's possible that your bitstream is incorrectly configuring the JTAG pins as GPIO. If I recall correctly on some FPGAs it's possible to use some of the JTAG pins as GPIO. Those settings are controlled in the GUI in the Device & Pin Options settings.

penguin359 commented 6 years ago

When programming any of the demos or my own VHDL projects, the JTAG is still accessible afterwards and I can reprogram the board without a power cycle so something in the fpgaminer does seems to be interfering with JTAG, but I can't find any specific JTAG settings or pin assignments. I forked my project off of the DE2-70 variant, but running a diff against, I don't see much more than the required changes for the different dev board.

I tried opening the In-System Sources and Probes Editor, but it pops up an error about no instances found in the current project/device. Most of the resulting window is grayed out, but I can see the USB Blaster Hardware and EP2C35 Device on the right. Once the device has been programed with the fpgaminer firmware, I can only find the hardware USB Blaster and no Devices attached same as what mine.tcl shows. I then need to power cycle. I also tried File -> New -> In-System Sources and Probes File, but it just brings up the same dialog and same errors.

I see the four instances of virtual_wire in the project hierarchy with the four different probes that mine.tcl is expecting, but they just appear as regular Verilog modules. I'm not sure if they are supposed to be some kind of recognized IP Core.

I also have tried dropping the PLL for now and directly attaching the hash clk to osc_clk and also programming the on-board EPSC16 flash, but no change in behavior.

penguin359 commented 6 years ago

OK, a wild experiment, I commented out all but one of the probes, NONC, synthesized and programmed it. Most of the designed was optimized out, it said less than 1% of logic elements in use, but it loaded. I could still find the device after programming it this time. I was able to load the In-System probes and this time it showed one probe on the left, NONC. Next, I uncommented the other output GNON. The designed used nearly 69% of the logic elements, close to the normal design, but after programming, the JTAG chain was dead. Running an IDCODE scan finds no devices. I'll try a few other combos, but there's something it doesn't like with how the virtual_wire module is set up.

fpgaminer commented 6 years ago

Weird. Any interesting warnings during compilation?

You could try re-creating the virtual_wire module. It's more-or-less just a modified version of what Quartus' megafunction wizard spits out to instantiate altsource_probe. Perhaps there's a parameter it doesn't like.

penguin359 commented 6 years ago

If drop only GNON, but keep all other probes, the designed is reduced down to only 4%, but what remains programs and works. I can start a live, running probe and see the NONC continually incrementing. Once I add in GNON, JTAG breaks. Looking through the warnings, I see only a couple warnings:

Warning (12241): 4 hierarchies have connectivity warnings - see the Connectivity Checks report folder Warning (13410): Pin "sld_hub:auto_hub|receive[0][0]" is stuck at GND

But checking the connectivity checks report, it has a total of 17 warnings including:

Declared by entity but not connected by instance. If a default value exists, it will be used. Otherwise, the port will be connected to GND.

For the pins: source_clk, source_ena, raw_tck, tdi, usr1, jtag_state_cdr, jtag_state_sdr, jtag_state_e1dr, jtag_state_udr, jtag_state_cir, jtag_state_uir, jtag_state_tlr, clrn, ena, ir_in

And:

Declared by entity but not connected by instance. Logic that only feeds a dangling port will be removed.

For the pins: ir_out, tdo

I'll try a build with GNON removed and see how the warnings compare.

fpgaminer commented 6 years ago

I'm curious, what happens if you leave all the sources and probes in but comment out: https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/blob/master/src/fpgaminer_top.v#L157-L165

You should still get a design that uses ~4% (because it optimizes out most of the design). But I'm curious if there's something specific about GNON, or if there's something about having all the logic in the design that's causing the problem.

Also, just to double check. You updated the device number? https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner/blob/master/projects/DE2_70_Unoptimized_Pipelined/fpgaminer.qsf#L39

It is possible to program a bitstream intended for a bigger device onto a smaller device. And I imagine we might see this kind of weird behavior if that is what was happening. So just wanted to double check.

penguin359 commented 6 years ago

Yep, I've double-checked that it matches the right device.

A closer look at the reports and I see the same warnings for all four virtual_wire modules and they are all present still on the remaining modules when I drop GNON, but the JTAG works in that case. I tried just changing the probe on NONC from nonce to golden_nonce while still leaving GNON commented out, but that broke JTAG equally. On the other hand, leaving all four probes in and just commenting out the second that assigned the golden_nonce as you suggested does work. I can bring up the dialog and see all four probes running on the FPGA. I can enable continuous read of both output probes and watch the nonce spin. I might try bumping CONFIG_LOOP_LOG2 to 5 and see what happens when I reduce my footprint on the design.

penguin359 commented 6 years ago

After dropping CONFIG_LOOP_LOG2 to the value 5, I was able to program the full design and start actual mining. It's actually working with all 4 probes attached correctly. I dropped it down to 4 and, well, saw my hash rate double, but with it set to 3, for some reason the it fails to see the probes or any device attached to JTAG. Here are the numbers:

CONFIG_LOOP_LOG2=2 (Doesn't fit, more than 33,216 logic elements required) CONFIG_LOOP_LOG2=3 23,054 / 33,216 ( 69 % ) CONFIG_LOOP_LOG2=4 12,783 / 33,216 ( 38 % ) CONFIG_LOOP_LOG2=5 7,674 / 33,216 ( 23 % )

At only 69%, I should have some room to spare. I don't see any warnings beyond other configurations so I'm not sure what's happening.

penguin359 commented 6 years ago

Hmm, even sometimes at level 4, it doesn't always come up after programming it. I think I also saw a case where it showed up when I first open the probes dialog, but then failed once I tried mine.tcl and required a power cycle. There might be something more going on here.

fpgaminer commented 6 years ago

Hmmm, could you try using a PLL to divide osc_clk down to, maybe, 6.25 MHz (divide by 8) but keep CONFIG_LOOP_LOG2 at 3? I'm wondering if maybe the dev board isn't supplying enough power?

penguin359 commented 6 years ago

I should have brought out my oscilloscope earlier. Probing the 3.3V rail near the FPGA and the trigger level to 3.1V, never hits it when programming most bitstreams including CONFIG_LOOP level 4 most of the time. When programming it with level 3, I see different results:

https://photos.app.goo.gl/DtecomavsoHinhrt7

The one time I caught the level 4 failure on screen, it was the same story. I don't think this was ever a code issue. I guess none of the provided demos come anywhere close to stress testing this board. I would have thought the demo with the NIOS II soft-processor and VGA console would have come close, but apparently not.

The wall wart I'm using with this board is not the original supply because the guy I bought this dev board from used gave me the wrong power cord for it, one which blew up one of the caps on the board before failing completely. I'll try it again with a beefier supply and see if it works better.

fpgaminer commented 6 years ago

Yeah mining firmware draws a lot of power*.

Let me know how tests go with a new supply. Worst case, like I mentioned, you should still be able to run the design with a slower clock since that will also reduce power consumption.

penguin359 commented 6 years ago

Well, I should have check my power supply much earlier on. The design from the DE2-70 works without issue with only three changes needed: the chip, the clock pin, and cutting the size in half to fit it on the smaller FPGA. I'm now hashing at 6.25 MH/s (double the speed of LOG2=4 and quadruple the speed of LOG2=5) and uploading shares to my pool operator. The design is using 69% of my FPGA and I'm not seeing any stability issues.

My new wall wart offers up to 1.5 A of current, but looking at the old wall wart more closely, it's only rated to handle 0.25 A which is clearly under-powered. I've had this board for 6 months now and this is the first I've run into power issues, but I guess all my previous designs have been tiny compared to this.

fpgaminer commented 6 years ago

Glad to hear you got it working 👍