udoklein / dcf77

Noise resilient DCF77 decoder library for Arduino
http://blog.blinkenlight.net/experiments/dcf77/dcf77-library/
GNU General Public License v3.0
93 stars 28 forks source link

Good hardware, bad hardware... #20

Closed Jo-Achim closed 7 years ago

Jo-Achim commented 7 years ago

Results at first: Udos DCF77-library works as expected. If not, have a look on your hardware.

Hello Udo, you remember that I have a little trouble working with your DCF77-library (based on your 'Simple_Clock with Timezone support'). So I see in my project, with very bad DCF77-signals on a 'MiniPro with 16 MHz-crystal', wrong dates and times (e.g. 09:xx:yy -> 03:xx:yy or Di., 08.10.2002 and Sa., 15.08.2054). And I ask me, what goes wrong?

So I added a little status-information to my OLED-display to see the filter-quality-state. And I saw, after the first 'synced', at best 'locked' only; sometimes after some minutes or, in better cases, after some hours.

Stripping down my project to minimize using other libraries and save ram space and variables doesn't solve the problem. So I took a second, absolute identical, crystal-based MiniPro, loading your 'Simple_Clock with Timezone support', add some digitalwrites and a LED to see the status 'unlocked' and lower, 'locked' and 'synced' and connected the DCF-input to the same DCF77-receiver parallel. To my surprise ;) - I see 'synced' on this second MiniPro and parallel 'locked' only on the first used MiniPro.

To verify this result, I transposed the two MiniPro's against each other and loaded my whole project to the second MiniPro. The result is now, that the (second) MiniPro with my project is over days in 'synced' mode; the first MiniPro parallel with your Clock example and the Status-LED is, at the same time, in 'locked'-mode only. In result I have a good MiniPro and a bad MiniPro. Unfortunately I don't know, if the good one (2nd MiniPro) or the bad one (1st MiniPro) is the normal; I can't find differences.

Summary: Your DCF77-library works fine. Thank you.

udoklein commented 7 years ago

The most relevant difference between different Arduinos is the crystal frequency. Both the frequency and its stability actually. If you want to see what is really going on you would use the "swiss army debug helper" in mode Ds or Da and see what's wrong with the frequency. Or you generate a high frequency signal with one of the timers (e.g. running at 1/2 clock frequency) and have a look with a frequency counter. If it is crystal based it should usually work. If it is resonator based you are exceptionally luck to find one that works at all.

Jo-Achim commented 7 years ago

Information: Hello Udo, I have not tested with your "swiss army debug helper" yet. But I got yesterday a second OLED-display and can show, what I mean. Both MiniPro are 16 MHz-crystal based and gets parallel the same DCF77-signal. The sketches are started at ~20 o'clock and I saw the last 'synced' from both Miniport approximately at 21:30.

In the upper part of the picture, you see the 'bad hardware' with 19:48:26 o'clock and a blue shining LED (= 'locked'). On the lower side you see the 'good hardware' with 23:48:26 o'clock, a green shining LED for 'synced'.

The left picture in picture show the DCF77-signal at this time and the OLED-displays with "DCF: l" for "locked" and "DCF: S" for 'synced'.

Next I will have a look with "swiss army debug helper".

dcf77

Jo-Achim commented 7 years ago

Hello Udo, currently I have only one programming device for my MiniPro, so I can not debug the MiniPros parallel. And because I think the results of your "Swiss knife" are more meaningful to you, here are the results of Ds and Da. Do you need more / other information's?

For your analysis, see Attachment. If you need the whole protocol, let me know.

DCF77_Log_Auszug.txt

udoklein commented 7 years ago

Looks like a drift of ~80 ppm. I would expect this is the good one. Is it?

Jo-Achim commented 7 years ago

No, sorry, 'The bad one'!

Referable the wrong time and date, I would have a very small idea: is it possible that the 'bad one', takes the date/time data from wrong DCF77 bits? The idea is derived from the realization that the time in the display of the 'bad one' was updated with a time lag; see "DCF77_2" below. (On the morning of the following day, the date and time of the 'bad one' was still wrong.)

On this occasion an older photo of a very bad DCF77-signal ("DCF77_Bad_Signal_Quality"). This is rare; whether it was before the time of the pictures in the above post, I do not know. (In order to prevent this, the DCF77-receiver is currently powered by a battery.)

dcf77_2 dcf77-bad signal quality

udoklein commented 7 years ago

No, it will not take the date from the wrong bits. If the hours and minutes are correct the phase is properly locked. I am aware of two conditions where this effect can happen:

1) The signal fades but in such a way that the correlation mechanism actually detects it as a valid date. This is only possible for the part of the decoded time that changes with low frequency (hence the date gets the corrupted but the hours not).

2) There is something that hogs enough CPU power to make the phase lock drift.

Failure mode (1) is usually correlated with altering between synced and locked mode. Failure mode (2) is usually correlated with going out of sync and never resyncing again.

It would be possible to catch both of them in software and automatically fix it. However on an Arduino with only 2k of RAM this is somehow to much. The easiest way for a fix would be to not decode the date at all. Instead just use the change from 23:59:59 to 00:00:00 to advance the day. However this will fail if the clock is not powered all of the time.

Jo-Achim commented 7 years ago

Ok, Udo, it's not a big problem if we do not find any explanations here. A small problem remains because there is no evidence to distinguish between 'good hardware' and 'bad hardware'. (I can send you the 'bad one' if interested.)

As far as the utilization of the Arduinos is concerned - I already wrote of the assumption 'overload' - so I can currently say that the test sketch on the 'bad one', based on your 'Simple Clock with Timezone support', complements with some lines to show the quality state via LED, counting the quality states 'below locked', 'locked' and 'synced' as well as to control the OLED-display (U8glib), with "#define use_SERIALMonitor" requires 26,768 / 1,208 bytes; with "// # define use_SERIALMonitor" 25,490 / 1,216 bytes. While the 'good one' with the same added functionality, libraries plus an additional control of a 'Digi-Dot-Booster' (library and hardware for a NeoPixel-Stripe) and with "// # define use_SERIALMonitor" is loaded with 28,220 / 1,368 bytes.

udoklein commented 7 years ago

Size is not relevant. Performance is. It may happen that the "bad" one has a slower crystal. Or it may have an unstable crystal. Hard to say without precise measurements.

Jo-Achim commented 7 years ago

Hello Udo,

long time ago... My last experiment is running unchanged (no time at this moment); two 16-MHz-crystal based MiniPros connected to the same DCF77-receiver. The 'bad hardware' is normally in locked mode, rare in synced mode. So, no news.

But a new question. When the 'bad one' is in synced mode, the time and date are correct. If the 'bad one' is in locked-mode, it is possible he has a wrong time and date; actually e.g. 19:32:17 at Sunday, 19.12.2077 (bad hardware) instead of 10:32:17 at Wednesday, 14.12.2016 (good hardware).

Is there any option to tell your filter to synchronize in synced-mode only?

I suppose that received signals like the last picture in my post on Nov. 12 is responsible for the wrong date and time in conjunction with the locked mode. Is that possible?

Jo-Achim commented 7 years ago

Hello Udo,

an important question for me was, in what filter modes was the filter between the first 'synchronized' (with correct date and time) and the situation with wrong date and / or time really?

So I added additional information lines to my display. The bottom line shows the current quality of 0 to 5 (6 segments = quality 5) and two lines above, the max. two segments show the highest and lowest filter quality state since the first synchronization.

The following figure shows that both crystal-based MiniPro have at the same time different wrong clock times and a wrong date; actual filter quality = 4 ('locked'); best filter quality = 5 ('synched'), worst filter quality = 4 ('locked'). (Wrong date often with year '77'.) dcf77_filter_quality_wrong_date-and_time

udoklein commented 7 years ago

Which version of the library are you running? I heard the symptom from someone else as well. You description gives a valuable hint. Can you also tell me which wrong clock time they have? And what is the default behaviour of your receiver module if the signal fails? That is while the clock is showing a wrong time is the signal at the pin low or high? Also what are your settings for the clock? Is your receiver delivering a low or a high pulse? That is: what is the output level when the signal is modulated (the first 100-200ms of a second) and what is the output when you receive no signal at all?

This will help me to finally nail down what is going on here.

Jo-Achim commented 7 years ago

Hello Udo,

it's fine, when I can help.

I have downloaded my used library at least on 05. Nov. 2016. But where I can find a version number? From "Simple_clock_with_Timezone_Support": "Simple DCF77 Clock V3.0".

I use a Conrad-module (BN 641138) with open collector and I use the non-inverted output; so the receiver delivering is high (in the first 100 - 200 ms of a second); signal-level ~4.8 Volt.

The configuration / clock settings are: const uint8_t dcf77_analog_sample_pin = 3; // (A)3 = 17 const uint8_t dcf77_sample_pin = 17; const uint8_t dcf77_inverted_samples = 0; const uint8_t dcf77_analog_samples = 1; const uint8_t dcf77_pull_up = 0; // Using external 10 kOhm Resistor

Which wrong time? Normally I have the correct time in 'locked'-mode, too. But in rare circumstances clock time (and date possible incl. weekday) are wrong or - see e.g. the last picture - both filters have, at the same time (!) connected to the same dcf-receiver, different wrong times; the two displays are from the same photo.

The next questions are difficult or we have a misunderstanding. ... Even when I see a wrong time (like in the last picture), the dcf-signal from receiver is available and can be pretty good (see scope on picture from Nov. 11). I have normally never seen a situation without modulated signal from dcf77-receiver. But in the past, I disconnected accidental the dcf-receiver-output over night; next morning time and date are correct. (Here the signal was high (external open collector pull-up)). Is this the answer to your question: "what is the output when you receive no signal at all"?

Most time, the dcf-signal is good; sometimes (possible several times a day) it is like the last picture in my post from Nov. 12. I sorry don't know the signal- or quality-situation, when the correct time or date goes wrong; only that the filter-quality-state was not below 4 ('locked').

Two other possible conspicuousness: If the MiniPros over days are running in 'locked'-mode only and they will restarted, they will run for a while in 'synced'-mode. And after the first 'synced'-mode both MiniPro fall into the 'locked' mode in about 90% of all cases, mostly after 30 - 40 seconds. With good dcf-signals, they are back again in the 'synced' mode within the next 2 or 3 minutes. Later they fall back into the 'locked' mode.

I hope this information's will help you.

PS: During writing this post, one MiniPro shows 17:33:05, 27.07.2077; the other one correct: 15:33:06, 21.12.2016; both MiniPros in 'locked'-mode.

PS2: Now I will test parallel one of the MiniPros in "dcf77_analog_samples = 0" setting.

Jo-Achim commented 7 years ago

PS2: Now I will test parallel one of the MiniPros in "dcf77_analog_samples = 0" setting.

Over night the result is: At 08:27:26 on Do., 22.12.2016 (correct time) the "dcf77_analog_samples = 0"-MiniPro has 00:27:25 on Di., 28.12.2077; lowest signal-quality: 'locked', actual signal-quality: 'locked'. So, the problem exist in "dcf77_analog_samples = 0" too.

The drift of both MiniPros are round about +30 ms within 1.000 seconds = 30 ppm.

Jo-Achim commented 7 years ago

Hello Udo.

First, the used library-version includes all changes in the three files "dcf77.ccp", "standalone.ccp", and "standalone.h": https://github.com/udoklein/dcf77/commit/dd21eabcf97edb80e21dcce0dc52d6fd8e538e5d. (And the examples all changes from: https://github.com/udoklein/dcf77/commit/6422ee6f95d41c08f53b75dec508f63090a1b7c1.)

Second, the experiment from yesterday is running and both MiniPro are incessant in 'locked'-mode. This one with the wrong time and date with the same time difference, the date is now: Mi., 29.12.2077. I will wait until 'synced'-mode to check, if time and date then correct; but I think so.

Third, I wrote to my dealer (Geras-IT, https://www.geras-it.de/) to ask, if I get by a new order, a 16-MHz-crystal-based Mini Pro from another series or another manufacturer. If yes, I order a new one to exclude series-error.

Until some news...

udoklein commented 7 years ago

Hi Joachim,

I think I finally got the cause as well as a fix for the issue you are reporting. I think the issue is caused by receiver that responds with "high" to a signal loss. My receivers all respond with low, hence I never occured to me during testing. My suggestion for a fix is here:

https://github.com/udoklein/dcf77/commit/a0db682cf755abbe04ac585812d33071ae07527c

Can you please test if this fixes your issue?

Thanks a lot for keeping my nose into this :)

udoklein commented 7 years ago

One more thought. In case the issue is not fixed by the new branch. Go to line 1297 in dcf77.h (see here: https://github.com/udoklein/dcf77/blob/a0db682cf755abbe04ac585812d33071ae07527c/dcf77.h#L1297 ) and increase the constant till the issue is resolved. I would expect this to happen at 2 but maybe 3 might be needed.

Please tell me what you find.

Jo-Achim commented 7 years ago

Hello Udo,

thank you for information's.

Without your last information's I can tell you... One of the MiniPros is in 'locked'-mode (with the same time delay and date error). The other one was falling from 'locked' to 'unlocked'-mode, then returning into the 'locked'-mode and is, since 24. ~14:00 o'clock, until now (incessant?) in 'synced'-mode; since 'synced'-mode with correct date and time.

So, I will test your last information's and give feedback.

A third MiniPro from another delivery is on the way.

Jo-Achim commented 7 years ago

Hello Udo,

we have different dcf77.h - files!

void process_1_Hz_tick(const DCF77_Encoder &decoded_time) {

begins in line 1556; the dcf77.h includes MSF60 definitions. Are more differences there?

I copied now the dcf77.h from https://github.com/udoklein/dcf77/blob/a0db682cf755abbe04ac585812d33071ae07527c/dcf77.h#L1297 ... but there are a lot of compiler errors; I think, dcf77.h and dcf77.cpp are incompatible!? Your github-download here contains the same files, that I used (before).

So I change 'my old dcf77.h' in line 1556 first to "quality_factor > 2" and test it; later to "quality_factor > 3". And I tell you the results.

Jo-Achim commented 7 years ago

A picture speaks more than words... dcf77-diagram

Jo-Achim commented 7 years ago

Hello Udo,

first short info to changes (Dec. 26.) from "quality_factor > 1" to "quality_factor > 3": the first initialisation needs round about triple time; with good dcf-signal (like post from Nov. 11.) ~ 19 minutes, normally with "quality_factor > 1" about 6 minutes.

Is it theoretical possible / correct?

Jo-Achim commented 7 years ago

Today, at about 10:30 (and before) is obviously the DCF77 transmitter failed (http://www.dcf77logs.de/WebConsole.aspx).

The Conrad-DCF77-module provided a logical zero at the non-inverted open collector output (pin 3) with pull-up Resistor.

Jo-Achim commented 7 years ago

Hello Udo,

was the dcf77-transmitter down this morning?

My both miniPro are in 'synced'-mode at 11:05. At 11:40 both were in 'unlocked' mode. After that, both filters switched between 'locked'- and 'unlocked' mode. And ... see diagram below.

My impression is, independent of the "quality-factor", that the filter-status changes more easily from the 'unlocked' mode to the 'synced' mode, than when the filter was for a longer time in 'locked'-mode only. dcf77-diagram_2

Jo-Achim commented 7 years ago

Hello Udo,

is it possible that your filter algorithms - maybe under certain circumstances (like start conditions / dcf-signal) - do not correctly resynchronize / calculate the filter adjustment (PLL) values? And therefore the filter will not (or 'too late') return to 'synced'-mode? Independent from "quality_factor"-settings (https://github.com/udoklein/dcf77/commit/a0db682cf755abbe04ac585812d33071ae07527c)? Please note the different dcf77.h files!

I remember more than one situation, that the miniPros were a lot of hours in 'locked'-mode, in spite of a good dcf-signal. After resetting, the miniPros were within ~6 minutes (with "quality_factor > 1") in 'synced'-mode.

Since my last post, both miniPro ran exclusively in the 'synced'-mode.

The day before yesterday... ~ 14:40 I updated my sketch with an option to reset the DCFmin-/DCFmax bars. MiniPro 1 with "quality_factor > 1" is continuously in 'synced'-mode. Ditto the miniPro 2 with "quality_factor > 3".

A miniPro '3' - a new one from another manufacturer with a 16-MHz-crystal - started at 20:49 with the same sketch like miniPro 2 but "quality_factor > 2". From ~ 21:04 this miniPro was in 'synced'-mode too.

All three miniPros - connected to the same dcf77-receiver - were in 'synced'-mode at 00:20 o'clock.

Yesterday... ~ 09:41 (or before) the miniPro 3 was in 'locked' mode (relatively safe uninterrupted - relatively, because I miss a third display, so I see the 'quality' only by the color of a RGB-Led); miniPro 1 and 2 incessant in 'synced'-mode.

~ 18:43 miniPro 1 + 2 in 'synced'-mode, miniPro 3 switches between 'unlocked'- and 'locked'-mode within a few seconds. ~ 18:50 miniPro 3 in 'locked'-mode (1 + 2 in 'synced'-mode). ~ 19:03 all three miniPros in 'synced'-mode; 1 + 2 without leaving the 'synced'-mode.

Today until now (12:00 o'clock)... All three miniPros in 'synced'-mode. MiniPro 1 + 2 are exclusively in 'synced'-mode since last start.

A proof that the filter changes easily from 'unlocked' to the 'synced' mode, as if he had been in 'locked' mode for hours!? Regardless of "quality_factor", because I do not see any significant differences among the different values; except for the initial initialization which has been significantly extended.

Conclusion: I rate the observed behavior of miniPro 3 compared to the other two as a "yes" to my input question. And I would now make no difference between 'good' and 'bad' hardware. Correct?

So, Udo, I think the rest is your part. I would be glad to hear from you.

In this sense, Joachim.

brettoliver commented 7 years ago

Hi Joachim. Thanks for all your hard work on this! I have had problems with my clocks after a long period of bad signal they too go into LOCK mode and refuse to go into Sync even though the signal returns. The signal quality also shows very low so I presume what the DCF library is expecting differs from the good signal.

In the end I have to run an "EEPROM Clear" program, reload my sketches and all is well again.

I have a number of clocks that run off a DCF77 signal "repeater" http://home.btconnect.com/brettoliver1/Pragotron_Clock/Pragotron_Clock.htm#dcf77_repeater so they should be getting exactly the same signal but I note the only clock running the release 2 library always goes back to Sync as soon as the signal restores.

I do not understand all of the technical details in this thread but it seems to describe the problems I have.

I look forward to Udo's reply to your work.

Brett.

Jo-Achim commented 7 years ago

Hi Brett,

nice work.

I might have a similar idea. In the meantime, I used the watch dog to restart the sketch. This, of course, makes sense only if an RTC is included. But the RTC is rather superfluous with Udos filter.

The other point, however, is the one that occasionally incorrect data (time and / or date). Here the filter should update its internal time only in the 'synced'-mode.

Joachim.

Jo-Achim commented 7 years ago

Hello Udo,

sorry, the previous findings keeping me busy ...

If I understand your filter (from the outside) correctly, you simplify do the following: You generate a filter-internal 1-Hz clock (1-HzPLL) for your PLL by dividing the 16 MHz-crystal clock. As reference signal for your 1-HzPLL, you use the exact 1-HzDCF of the DCF transmitter. Since neither the dividided clock nor the crystal frequency results in an exact 1 Hz signal, you adjust the dividers inside your library so, that you get step by step a very accurate 1-HzPLL signal. Correct?

Probably you use for the adjustment a statistical function, which as a result with the accumulation of values, occurring data outside the accumulation with lower estimation, so they are less important in the readjustment!? So, it need many data 'outside the accumulation' to change the quality-mode?

This would explain why it seems to be difficult for the filter, staying in 'locked'-mode for a long time, to switch from 'locked'-mode to the 'synced'-mode, if, on the one hand, the 1-HzPLL-adjustment - as the term 'locked' suggests - effects also in the 'locked' mode. And, on the other hand, the 1-HzPLL adjustment for a DCF-signal quality below 'locked' is reseted, reinitialized or similar. (Because from below the 'locked'-mode ('unlocked', 'free', 'dirty', 'useless') the filter is fast ("quality_factor > 1") into the 'synced'-mode.)

If so, could it be a solution, in the 'locked'-mode do no 1-HzPLL-adjustment? Or at least to evaluate differently? Though the 'other evaluation' would perhaps be more useful to limit the drift of 1-HzPLL within a longer, poor DCF-signal. Or even if the adjustment data are still not available or insufficient, about after the program start.

This could, at the same time, be a solution to the problem of the wrong time / wrong date. Since the wrong data in my experience always occurred after the filter was in 'locked'-mode for an extended period of time and the DCF-signal (simultaneously) was bad enough (see last pictures from Nov. 12, 2016). Alternatively, the date or time should not be set in the 'locked'-mode (here the DCF77 parity bits possible may not be sufficient); similar to what happens in 'free'-mode or when the DCF77 signal is missing.

Best regards, Joachim.

udoklein commented 7 years ago

Hi Joachim,

thanks for all the efforts. Can you give me a phone number and a suitable time that we can discuss by phone what is really going on?

Best regards, Udo

Jo-Achim commented 7 years ago

Hello Udo,

We can sometimes make a phone call - is perhaps easier. I am normally available at +xxxx.

Best Regards, Joachim.

Jo-Achim commented 7 years ago

Because I had just prepared another post ...

I have a supplement - without my previous post to contradict - ...

I had the following behavior of the DCF library recognized as 'normal' ("quality_factor> 1"): After program start and first 'synced' the filter fell into the 'locked' mode within the first few minutes (~ 2) and after about twice the time to return to the 'synced' mode. In the further course 'synced'- and 'locked'-mode corresponded approximately to what one might expect from the respective DCF77 signal quality (blinking LED / Oscilloskop).

Because of my 'DCFmin.- / DCFmax.-quality' display, I could since the last start of miniPro 1 ("quality_factor > 1") and miniPro 2 ("quality_factor > 2") in contrast, that both until today exclusively and continuously in the 'synced'-mode! (Since round 7 days.)

MiniPro 3 ("quality_factor > 3") - connected to the same DCF77 receiver in parallel - changes from 'synced'-mode to 'locked'-mode and back. It seems as if "quality_factor > 3" actually facilitate the change to the 'synced'-mode, but at the same time also the relapse into the 'locked'-mode.

The only thing that is very surprising to me with regard to my observations or conclusions is, that the DCF77 signal quality on initialization / the first time in 'synchronized' mode has such an effect on the further behavior / stability of the filter (for 'synced'-mode).

In any event, this further recognition is consistent with the fact that the observed problems do not necessarily have to occur.

Best regards, Joachim.

Jo-Achim commented 7 years ago

Hello Udo, hello together,

we could solve the above problems. As suggested in the meantime, it was due to a wrong / non-current library version; here a development version. Udo has changed the default settings for the github download so that such a mistake should no longer happen. Udos 'master' libraries work, even under adverse DCF77 receive conditions, as desired, and named errors no longer occurred.

Thanks again.

PS: For some DCF77 modules (Conrad), it is advantageous to use a stabilized supply voltage; approximately 3.3 volts from a used module / voltage regulator.

udoklein commented 7 years ago

The library is now registered with the Arudino library manager (see here: https://github.com/arduino/Arduino/issues/5935). So if you use it from within the Arduino IDE it should now be pretty easy to get the proper officially released version.

With regard to the mentioned "bug". By now I implemented a standalone debug helper which can consume log files. Jo-Achim kindly send me a log which exhibited the issue. With the official release of the library I can confirm that it parsed the data correctly (including noise + crystal drift of ~130ppm as it was with Jo-Achims hardware). Thus our conclusion that the issue is with the development branch only.

Stick to the Arduino library "dcf77_xtal" and you always get the latest and greatest :)

brettoliver commented 7 years ago

Hi Udo/Jo-Achim. I am working on another project at present but as soon as I have some spare time I will load it onto one of my clocks. All my clocks work off the same signal via a repeater so I will be able to compare the results together, I was getting issues with my clocks not re-syncing after a period of bad signal and the only way to clear them was to run an EEPROM clear sketch.

My clocks run release 3 apart from my original Master Clock that is running release 2 but this does not seem to have these issues.

Thanks for all your hard work on this update. Brett.

udoklein commented 7 years ago

I received no more feedback for quite a while. I assume this is resolved.