juribeparada / MMDVM_HS

MMDVM HotSpot: firmware for ZUMspot or MMDVM_HS based boards (D-Star, DMR, YSF, P25, NXDN and POCSAG)
GNU General Public License v2.0
345 stars 141 forks source link

Duplex hotspot cutting out after around 60 secs #58

Closed rogerclarkmelbourne closed 5 years ago

rogerclarkmelbourne commented 5 years ago

Andy

I have a problem with my duplex hostpot where after around 60 secs it seems to stop sending data to MMDVMHost

The board is a clone of https://github.com/phl0/MMDVM_HS_Dual_Hat (its not quite identical), as the pin headers for the OLED and the SWD (programming interface) etc are not in the same place, but I'm sure its basically the same board.

On Brandmeister, looking on hose.brandmeister.network , I can see after around 60 secs , the audio stops, with a squeak (not clean cutoff), and it looks like I have released the PTT

On DMR+, the server seems to indicate that I am still sending data, but people report that I have no audio after around 60 secs, when it makes this squeak noise.

I'm running PiStar, and I've tried enabling debug on MMDVM_Host (I tried debug=1 and debug=2) and looked in the log file, but I don't see any errors.

(I'm not sure if there are any higher debug levels .. I need to check the docs for MMDVM_Host)

I've reloaded the firmware, with the latest version from your Releases (1.4.6) (I programmed it via STLink) But it didnt make any difference

Can you suggest what else I can do to debug the problem ? As I'm not sure if its a problem in the modem or potentially some sort of timeout / config problem in MMDVM_Host or perhaps DMR Gateway etc

Thanks

rogerclarkmelbourne commented 5 years ago

I did some more testing, and my MD-380 won't connect to the hotspot at all, when its in duplex mode (though its fine in simplex)

Using the GD-77. After 1 minute the COS LED goes out but the PTT LED says on.

I've listened to the output of the GD-77 (on another DMR handheld) and its continuing to transmit on the Rx frequency of the hotspot.

I think the COS is an indication of incoming signal, so it looks like the HS does not think its receiving signal from the receiver

What I find odd is that it times out at almost exactly 60 secs every time.

BTW. K2GOG said he had thermal problems which were causing his TCXO to drift and he added extra heat sinks and shielding to the board, and that solved the problem. But I tried a 12V CPU fan blowing air over both sides of both the HS and RPi 3 boards and it made no difference at all to the length of time before the COS LED went out.

So I don't think its a thermal issue in my case.

I also try changing the MMDVM Beacon time from 60 to 240 in case the 60 was the number of seconds. But that made no difference either

I will update this issue if I find out anything else.

juribeparada commented 5 years ago

I tried to reproduce this issue and yes, you are right, it happens in all my duplex boards. There are slightly differences in times, depending on the radio and/or boards: some are 60, 65, 51, etc seconds. It seems to me depends on TCXO and radio bit rate differences. Because nobody until now point me about this issue, I thought this was because some recent changes. I tested old FWs and MMDVMHost, but not, this issue always was here!. I'm surprise that nobody reports this before. This not happen with other modes (D-Star works fine in duplex for example), just with DMR duplex. Also not happen in any MMDVM standard system, just duplex MMDMVM_HS based boards. Then, my conclusion after several tests is this problem comes from the DMR duplex RX logic in the firmware code, I hope to find some free time to fix this issue during this weekend.

rogerclarkmelbourne commented 5 years ago

Andy

No worries

If you need any help, let me know.

I thought it was possibly my board, so I ordered 2 more (different types), but I'm glad its possibly a firmware issue ;-)

juribeparada commented 5 years ago

Yes, definitely a firmware issue, then must be correctable :)

rogerclarkmelbourne commented 5 years ago

(LOL) OK. Is it this potentially an issue in MMDVM Host or limited to the firmware? (Do we need to alert Jonathan ?)

juribeparada commented 5 years ago

Just MMDVM_HS firmware, MMDVMHost works fine because a normal MMDVM repeater that I have here works just fine with long RX.

rogerclarkmelbourne commented 5 years ago

OK

juribeparada commented 5 years ago

Roger, please test now, I think the problem is solved with the last commit, or at least minimized. The root of the problem is clock bit rate drift between radio and HS. Because this depends on TCXO small differences (radio and HS), I modified a little the code to minimize this effect. At least here works for all my duplex boards. Tell me if this fix works for you.

rogerclarkmelbourne commented 5 years ago

Hi Andy

Thanks.

I'll build it and flash (via STLink) to my duplex board.

Its interesting that its a TCXO drift problem thats causing it, because I spoke to Steve K6GOG last week, and he can cure it completely on his boards, simply by adding passive heat sink cooling to his TCXO and to his RPi

BTW. I thought the TCXO's were already quite accurate. Mine has a ECS branded TCXO on it, which is the same brand as on my modified module in my Zumspot Libra.

However Steve, recommended replacing the TCXO with this one

https://www.mouser.co.uk/ProductDetail/520-TXO-3225-14.74T?R=ECS-TXO-3225-147.4-TRvirtualkey59070000virtualkey520-TXO-3225-14.74T

The picture on mouser looks different to the TCXO on both the Libra module and the duplex board, but when I searched on Mouser for "14.7456 TCXO"

https://www.mouser.co.uk/Search/Refine.aspx?Keyword=14.7456+TCXO

I only see a 2.5ppm ECS and one other TCXO, so I'm not sure what TCXO is on the Libra and the duplex board, as it doesn't look the same

Do you know whether it would be possible to resolve this problem permanently if perhaps it was possible to get a 1ppm TCXO etc

Or is there an alternative e.g by having a different TCXO for Tx and Rx (though I suspect this won't work at all )

rogerclarkmelbourne commented 5 years ago

Hi Andy

Something strange is happening when I use the latest code.

When I first installed it, and changed my Modem type to the Dual hat (I've been running in simplex mode as a single hat), I checked the MMDVMHost log file and it connected OK to MMDVM_HS

But when I entered the duplex frequencies, it looks like MMDVM_HS has crashed, as I see the YSF LED quickly fading up and down (Known in the STM32 Core core as "Throb")

This normally only occurs if the code crashes because of an Assert, (normally in the core)

Its a pain to debug this on my RPi, so I'll need to wire the board up via a USB to Serial connector and connect it to my PC, as I have MMDVMHost running on my PC as I compiled the exe (some time ago)

But I won't be able to look at this until this evening (its almost 12:00 - lunch time here)

juribeparada commented 5 years ago

Roger, I don't have any problem with the code, I compiled for two home-brew duplex boards, which use USB (and then require more memory) and also I compiled for a BI7JTA duplex boards, all boards work fine for several modes. Be sure you did a "make clean" or even you can try to download the entire code again.

Regarding TCXO, I believe things probably are different for ADF7021. We will always have a difference between radio TCXO and HS TCXO, that will produce the drift with the time. ADF7021 can recover the clock from the receiver signal, but this can be slightly different from the TX clock. This difference between RX and TX clocks will cause the problem. I think a better TCXO will increase too much the price of a HS, better TCXO is needed for higher frequencies, for example 800-900 MHz. Most of the people believe that a bad TCXO will cause only a frequency offset, but also will produce a bit data rate error, which is not possible to fix in this case. I think the latest fix will give several minutes for RX at least.

rogerclarkmelbourne commented 5 years ago

Hi Andy

I just realised I used the wrong compile option.

I'm using the IDE, and I accidentally compiled the dual HS Hat config with USB Serial enabled in the core.

It should be a Serial only build

I'll recompile , and restest

rogerclarkmelbourne commented 5 years ago

I did some more testing, and initially I had a timeout at 1:57 however I re-tested and repeatedly I now get a timeout at 4:00 (4 minutes)

Obviously anything over 3:00 minutes would be OK.

I have tested with 2 separate GD-77's and they both give the same result.

I will do some more tests, and also see if I can work out a way to get someone else to do some testing. But to do that I'll need to work out a way to make it easy for him to install the binary

juribeparada commented 5 years ago

I will release a binary version soon. I only needed that you confirm that it works. I got the same value, around 4 minutes. You can try different values at DMRTX.cpp, lines 258 and 268: if (i == 8U) Try some value between 8 and 15. After some calculations I selected 8 because give the same time for positive and negative drifts, but at the moment I have found only drifts in one direction.

rogerclarkmelbourne commented 5 years ago

Is there any way to measure the drift ?

I have a 300MHz Rigol scope (DS2072 hacked to 300Mhz) and also a 100Mhz USB logic analyzer.

BTW. I sent the compiled version of firmware to 2 people and asked them to test, but they have not replied yet.

I think the code is stable, so you can probably release a new version anyway..

juribeparada commented 5 years ago

You need to link the following parts of the code to GPIOs in order to measure: 1) TX: "control_tmp" variable at DMRTX.cpp, line 239 2) RX: when "m_dataPtr == m_endPtr1" or "m_dataPtr == m_endPtr2", DMRSlotRX.cpp, line 165, 267

rogerclarkmelbourne commented 5 years ago

OK

Thanks.

I'll add some code and use some of the LEDs on the board, and solder some small wires to I can attach them to a USB analyser

BTW.

I don't know if its because I compiled with -O3, but I got a timeout at 1:40 when in a QSO this morning.

But I just re-tested and its OK.

I'll do some more testing

rogerclarkmelbourne commented 5 years ago

Just a thought, but if the amount of drift can be determined by looking at these parts of the code, can't something be done to correct it ?

I'm not sure how many of the STM32 timers you are currently using, but if any are free you could use them to look at the time difference and potentially use that information to perform some correction.

Even if you already use all the hardware timers, I think there some sort of main clock counter which is also accessible (albeit I don't think you can reset its value, but you can read it)

juribeparada commented 5 years ago

Not sure if that is possible, any correction will affect the next slot. In general TDMA systems needs accurate clocks, and unfortunately this is not the case. Not only for TCXO (which is no so bad), but it seems to me ADF7021 introduces additional clock "error" after clock regeneration at RX. If the 4 minutes limit is still consistent, I think that is enough for most of the users.

marrold commented 5 years ago

Sorry to jump in, but I thought it was worth mentioning Jonathan implied in the presentation he did a while ago that he was using the Rx from the MS to synchronise the Tx from the BS, but maybe I misunderstood or maybe you already knew this already.

juribeparada commented 5 years ago

Yes, I know that, because MMDVM_HS is based on MMDVM, it's using the same approach, with modifications of course. Actually is the TX that synchronizes the RX framing. But here we are dealing with an additional difficulty: RX and TX clocks are not the same, even if you share the same TCXO for both ADF7021, because the RX ADF7021 will try to adjust the bit rate clock (clock recovery for RX). There is no a easy way to adjust TX ADF7021 clock. Then, received frames will be outside of the correct window frame after some time.

Tomxtal commented 5 years ago

I don't know if this will help. I have seen the same problem with Aliunce HD1, Anytone 868, BTech 6X2 and MD-UV380. My one works just fine with TYT MD2017 and MD390..

rogerclarkmelbourne commented 5 years ago

@Tomxtal

I guess possibly your MD2017 and MD380 both happen to have very similar TCXO frequencies to your hotspot.

I heard of some people who were experimenting with using an external clock generator, using a SI5351 as the clock source

https://www.silabs.com/documents/public/data-sheets/Si5351-B.pdf

But I don't know if this was to dynamically control the clock frequency or perhaps just to tune it to match their transceiver.

I'm also not sure whether the SI5351 would be accurate enough or perhaps have too much clock jitter.

rogerclarkmelbourne commented 5 years ago

BTW. I don't know if simplex hotspots can also have the same problem. But when I was speaking to VK7JB last night, he had an issue where his simplex hotspot cut out, and made the same sort of squawk sound on the audio, which occurs when this bug in the duplex board occurs.

juribeparada commented 5 years ago

This problem affects only duplex mode, because simplex works in a complete different way, there is no RX window limits, no framing sync between TX and RX, only one ADF7021, etc. Surely the issue that you described is due to another problem.

rogerclarkmelbourne commented 5 years ago

OK. I just thought I'd mention it as it appeared to be the same noise as when the problem occurs on the duplex board

rogerclarkmelbourne commented 5 years ago

PS. Do you think its worth experimenting with the Si5351, as the clock generator, as I have some of these modules. In theory the clock speed could be adjusted to the ADF7021 (assuming the Si5351 does not have too much jitter). I TCXO could be used as its reference. (But I think it needs a 25Mhz version :-( )

Also BTW. I heard that possibly the 12Mhz duplex version may not have this bug.

Is this correct ?

juribeparada commented 5 years ago

Si5351 looks like an interesting clock option. I haven't worked with that part, actually we need a VCXO of 14.7456 MHz, and a clock tracking algorithm.

I don't know about the 12.288 MHz duplex version, I only have 14.7456 MHz duplex boards.

phl0 commented 5 years ago

I have some duplex boards with 12.288MHz TCXO in production. Is it just to test "long" transmissions from a handheld radio?

rogerclarkmelbourne commented 5 years ago

@phl0

Yes. I don't know if your 12.288Mhz boards had the long transmission problem, but most 14.7456 Mhz boards had this problem.

You would need to test using firmware 1.4.6 and not the latest release (as the latest release has the work-around, which extends the problem to 4 minutes)

With my GD-77 the cut-out was after almost exactly 60 secs, but Andy reported some slightly longer times.

As described by Andy, the problem seems to the the clock on the hotspot, not being exactly the same freq as the radio, and over a "long" transmission, they can drift out of sync on the Tx.

There are several threads open about the same problem, and its been reported that some radios work OK but some don't. I suspect the radios that don't work very well e.g. timout quickly, is because their clock is a lot different from the hotspot clock.

BTW. What PPM are the TCXO you use ? I can only find 12.288Mhz TCXO at perhaps 50ppm, but I thought the 14.7456 Mhz TXCO's were 2.5ppm

phl0 commented 5 years ago

I am not sure because we didn't spread too many boards yet. There is a bunch in production.

However, I can try to test with my MD-380 and probably test with a GD-77 of a friend.

The TCXOs we use a 2.5ppm ones for both 14.7456 and 12.288 MHz. The latter is this one: https://www.mouser.de/ProductDetail/ECS/ECS-TXO-3225-1228-TR?qs=sGAEpiMZZMt8oz%2fHeiymAM%2fH8uoVtcI63urX6E9s5urz3DK8T8WgtA%3d%3d

rogerclarkmelbourne commented 5 years ago

OK

If you test on some different radios, you are likely to see the problem on your duplex board, but we don't know if perhaps the 12.288Mhz crystal has less problems than the 14.7456

I think as the problem is a very slight difference in frequency of the clock, both types of board probably have the same problem.

juribeparada commented 5 years ago

I think fw v1.4.7 "solved" this issue for most of the users. I will close this issue, you can re-open again if needed.