udoklein / dcf77

Noise resilient DCF77 decoder library for Arduino
http://blog.blinkenlight.net/experiments/dcf77/dcf77-library/
GNU General Public License v3.0
91 stars 28 forks source link

Resetting the whole library #31

Closed lakeroe closed 6 years ago

lakeroe commented 6 years ago

Hello,

I'm running the library on an ARM STM32F103C8T6 and this works very well, but now I've noticed a strange behaviour.

After power-on the first received time if (clock_state >= Clock::free) is always valid and accurate but after some hours (or days) the time starts to drift away and sometimes also the date is wrong allthough the inaccuracy should be below 200 ms according to the comment in dcf77.h. This seems to happen especially in a quite noisy environment.

I assume this problem is related to: https://github.com/udoklein/dcf77/issues/27

Because I use this library to synchronize my ARM built-in realtime-clock I need a reliable time only once in a while. So my idea is to reset the library everytime a valid time is received (clock_state >= Clock::free). My C++ knowledge is quite limited but I've managed to add a reset-function which calls DCF77_Clock::setup() and DCF77_Local_Clock::setup() and I call this reset-function everytime a valid time is received.

I'm currently running a test for about 24h and this seems to work well.

Do you think this is the right approach for my problem and is the reset-function correctly implemented ?

Thanks and best regards, lakeroe

udoklein commented 6 years ago

My suspicion would be that this is caused by an unstable local oscillator. You are right that a similar issue was described https://github.com/udoklein/dcf77/issues/27 in the past. However I was never able to reproduce it at all. Without any means to reproduce it I have no way to locate the issue and fix it.

I suggest that you first try to tell the library that your oscillator might be unstable, e.g. set has_stable_ambient_temperature to false.

https://github.com/udoklein/dcf77/blob/1103b38718e8be451031f6b9986a4291c0a085e0/dcf77.h#L123

If this fixes the problem tell me about it. If this does not fix the problem I have no idea what is wrong. If you really need to enforce a reset of the libary this is in fact done by calling DCF77_Clock::setup(). This may work around your issue. For you this might be good enough. For me this is really annoying because I still have no clue why my library fails for some people.

Most probably I would need someone who can reproduce the issue and who lives close enough me such that I could visit and instrument the hardware with suitable measurement equipment.

lakeroe commented 6 years ago

Thanks for your reply. I will do further tests and report back in a few days ...

lakeroe commented 6 years ago

I've done some more testing on two different hardwares:

Board 1: Arduino clone (but definitly quartz crystal based) Board 2: STM32F103C8 blue pill board (http://wiki.stm32duino.com/index.php?title=Blue_Pill)

For all tests I used two different "Reichelt modules" and following time as a reference: https://www.uhrzeit.org/atomuhr.php

1st test: has_stable_ambient_temperature = false (as you suggested) I powered-up the boards and waited for the best possible signal quality (clock_state==synced). Then I wrapped the antennas in aluminium foil to disable dcf77-reception. I verified this by the non-blinking dcf77_monitor_led. Result after about 24h Board 1: clock deviation ~ 0 seconds, correct date, clock_state=2 (free) Board 2: clock deviation ~ +4 seconds, correct date, clock_state=2 (free)

2nd test: has_stable_ambient_temperature = true Same procedure as before but changed software. Result after about 24h Board 1: clock deviation ~ 0 seconds, correct date, clock_state=2 (free) Board 2: clock deviation ~ +4 seconds, correct date, clock_state=2 (free)

3rd test: has_stable_ambient_temperature = true I powered-up the boards and waited for the best possible signal quality (clock_state==synced). Then I put the antennas in an upright position to simulate worse dcf77-reception. Result after about 22h Board 1: clock deviation ~ +94 minutes, correct date, clock_state=3 (unlocked) Board 2: clock deviation ~ -4 seconds, wrong date (14.7.2053), clock_state=4 (locked) exactly 62304s after power-on the date changes from 14.5.2018 to 14.7.2053

My modest opinion: The first two tests show similar behaviour. Allthough the clock deviation on the second board is quite large it should be okay. According to your github page a drift of up to 100 ppm is within the design margins of the library (4s in 24h is only about 46 ppm). I think one problem is the library stays in clock_state 2 (free) even there's now reception for 24 hours. One possible solution is to add a watchdog: If the input pin doesn't toggle for a specified amount of time it resets the library.

Test three shows the typical behaviour I mean. Board one shows large clock deviation and board 2 wrong date. This test is not so meaningful as the result changes a bit everytime. For more reliable results more testing is needed. But it should be quite easy to reproduce by simulating bad dcf77-reception (put the antenna near metal objects or partly wrap it in aluminium foil).

Maybe this way it's possible for you to reproduce the problem !

To finish with something different: Further up you wrote resetting the library is done by calling DCF77_Clock::setup(). This does NOT work for me, I also have to call DCF77_Local_Clock::setup().

Thanks and best regards, lakeroe

udoklein commented 6 years ago

Can you record some trace with the "Swiss Army Debug Helper" in mode Dm ? At least enough to reproduce the issue + some more hours?

With regards to the simulation of bad reception: I tried several things in the past. E.g. the standalone debug helper. So far I found no way to reliably simulate real word noise. I know that in theory a lot of the noise can be modelled, but reality is still different. Without real world data I have no chance. The only thing that I can say for sure that I can not reproduce the issue with my setup. And I tried quite a lot of things (e.g. putting the antenna on top of magnetic loudspeakers or on top of motors). Sometimes I just pulled the antenna for some hours. The issue is that this approach is very time consuming and I can not reproduce the behaviour so far.

lakeroe commented 6 years ago

Okay, I'll try to post a log to analyze the behaviour.

Regarding to reset the whole library: Does it make sense to you to also call DCF77_Local_Clock::setup() ?

udoklein commented 6 years ago

Good point. Yes, this is a good idea. Maybe I should add it as already suggested in issue #30. Well, maybe if it hits the third time ;)

lakeroe commented 6 years ago

Please find a first LOG attached (this is just to give you a start). The date is correct but the time is off by 195 seconds after about 22 hours.

BTW the procedure for this log (and the followings) is always the same. First I put the antenna in an optimal position, then I wait for the best clock_state possible (synced) and then I put the antenna in a "non-optimal" position ...

I'm currently running further tests (both on AVR and ARM board), so hopefully I can provide a more significant LOG soon ...

dcf_2018-05-15_AVR.zip

lakeroe commented 6 years ago

Just a short intermediate result: So far I was not able to reproduce the wrong date behaviour using the "Swiss Army Debug Helper". But I'll keep trying ...

Furtermore there are two more questions: 1) In order to compile your library for an ARM STM32F103C8T6 running at 72MHz I added one

if defined(STM32F1) #endif section at the end of dcf77.cpp

Could you just have a quick look to see if there's any obvious mistake ? Is there anything to change for the different CPU frequency ? 2) In the meantime did you have a chance to look at my LOG above and do you have an explanation for the big clock deviation ?

udoklein commented 6 years ago

I do not have an STM board to verify. However the code looks sound.

With regards to log file I create a little analysis helper here. It detected no anomaly.

Then I looked at it with the standalone debug helper

Clock State Statistics
  0 useless : 511
  1 dirty   : 38
  2 free    : 77918
  3 unlocked: 0
  4 locked  : 1201
  5 synced  : 6

Clock State Transition Statistics
  useless  => useless : 511
  useless  => dirty   : 1
  dirty    => dirty   : 37
  dirty    => synced  : 1
  free     => free    : 77917
  locked   => free    : 1
  locked   => locked  : 1200
  synced   => locked  : 1
  synced   => synced  : 5

Quality Factor Statistics
  0: 79630
  1: 38
  2: 6
  3: 0

Conclusion: it was synced for just 6 seconds and then the signal degraded. I have two possible explanations.

1) The local clock is unstable or has a large drift. 2) The phase lock "locked" to the noise and thus started to drift away

That is for (2) there must have been some correlation of the noise with the filter kernel. This could be "easily" fixed by more advanced signal processing. The issue is that my main target platform (Atmega328 aka Arduino) has so little memory and CPU. With other words: I know how to do this on bigger and better machines but given the tiny resources I have not yet found any really good solution for this challenge.

Is the noise that you pick up real world relevant or is it just artifically introduced fading by turning the antenna? Where are you located? Is this issue really relevant in practice?

What I am aiming at: my decoder is of statistical nature. As the noise level increases I have only two options: (1) reject the signal (2) try to decode anyway accepting that there is a risk that it will pick up unreasonable data

I opted for (2) because I did not want to implement an arbitrary limit. If you figure out that this is unacceptable in your situation then (1) would also fail to deliver any reasonable time. Now if you want to detect if (2) picks up questionable data I suggest to implement a simple check if time starts to jump backwards or if the decoded time drifts way to much relative to the local clock. I suggest to double check before that the local clock is definitely within the 100 ppm design limit.

lakeroe commented 6 years ago

Thanks for your analysis. As I already wrote, I waited for a complete sync and then artifically introduced the signal fading by turning the antenna. My location is near here. For me this issue is not relevant at all, because I only need a reliable time once a day and this can be easily achieved by resetting the whole library as mentioned above.

Regarding to double check the local clock 100 ppm design limit. I don't have the possibility to accurate measure the quartz crystal frequency but how about following idea: I use the 1000 Hz Systick Timer to create a clock and compare it to a reference (www.uhrzeit.org/atomuhr.php). If it deviates less than 8,6s in 24h (=100 ppm) then I assume it's okay ?

nameoftherose commented 6 years ago

No we are not close at all. Please ignore this comment @lakeroe We are just a few kilometers apart ... What dcf module are you using? How long does it take to sync? How long it stays in sync per day? (in my case about 8h/d)

udoklein commented 6 years ago

Just send me your phone number by private mail. Maybe we can have a call during the weekend. With regard to the issue, you say you use systick. Thus I assume you are running on ARM. On Saturday I will setup a test with an ARM board for 24h and see if there are any regressions. With my board I had never any issues so far as the signal quality around here is way above the limit for my library. So either there is some regression that did not catch my attention or there is something with your setup. 100 ppm should be absolutely OK, but there may be other issues. I suggest to discuss this by phone (or in person).

udoklein commented 6 years ago

So far my tests show nothing unusual. Maybe you could run the debug helper in mode DA to see if your clock is really stable. Sometimes it is not the hardware but interrupts that mess up the timing.

udoklein commented 6 years ago

After 48 hours on Arduino Due (ARM) with a Pollin Module everything looks perfectly well.

   168310, +------XXXXXXXXXXXXXXXXXXXX85-+---------+---------+2X9--7---+---------+3--------+---------X---------
confirmed_precision ?? adjustment, deviation, elapsed
0.0625 ppm, @+ , 24.8750 ppm, -1 ticks, 412 min + 38007 ticks mod 60000
57/100~25/600

Decoded time: 18-05-27 7 21:31:43 CEST ..
  18.05.27(7,0)21:32:42 MEZ 0,0 26 p(67457-40:255) s(191-0:26) m(244-5:33) h(241-0:33) wd(249-1:34) D(246-3:34) M(245-1:34) Y(250-4:34) 127,32,32,50

I have to assume that it is something with your setup. The question is what?

lakeroe commented 6 years ago

Sorry for my late reply. I was not at home the last days ...

I could observe the drifting clock on both AVR and ARM based board. One thing I noticed is to have a clean and stable power supply. When powering the whole setup by a very cheap "usb china power supply" I don't get any reception at all.

As suggested, I will run the debug helper in mode DA and report back. Furthermore I can offer to send you a complete ARM based affected hardware for your investigations.

I'm quite busy until sunday but next week we can have a call if you like.

@nameoftherose If you're interested you can send me your exact location by private mail ...

nameoftherose commented 6 years ago

@lakeroe I live in Heraclion Crete (35.332832, 25.121835), 2000km from the DCF77 transmitter. I do not have your email. I am using this library on an UNO on breadboard. My problems were due to signal fading and too much noise (from power supply and the antenna). Since December 17 it is working reasonably reliably.

lakeroe commented 6 years ago

Please find another LOG attached (in DA mode). Interestingly the date is wrong but the time is accurate.

dcf_2018-05-28_ARM.zip

nameoftherose commented 6 years ago

What is the length of your antenna?

lakeroe commented 6 years ago

I'm using this DCF77 module and the antenna is about 55mm long and 9mm diameter. The reception quality depends very much of the orientation of the antenna ...

nameoftherose commented 6 years ago

This is a graph of the various performance metrics produced by swiss_army_debug_helper. qf The phase_lock (marked p) is erratic. Assuming the library has been properly ported, I would say it is noise. Filter power supply - I am using LC filters - even use battery or feed from a laptop operating from its battery. Reset the clock whenever calculated week_day differs from the decoded one. For my location (2000km from the transmitter) a 55mm antenna is too short, but you are much closer.

lakeroe commented 6 years ago

Thanks for your analysis, you can clearly see the dependancy of the antenna orientation. At the beginning the phase is locked for some time (antenna is horizontal and pointing towards dcf77 sender). Then it looses the lock (antenna was turned vertical) and at the end it's locked again (because antenna is horizontal and pointing towards dcf77 sender again).

How does your antenna look like ?

nameoftherose commented 6 years ago

I am using the Conrad module. Its antenna is 50mm.

udoklein commented 6 years ago

Looking at the analysis of nameoftherose there is very interesting behaviour. First sof all the phase lock fluctuates a lot. Then during the period with the poorest phase the month and day decoder "lock" to the noise. Once the signal gets really good (before 17:00) the other decoders increase in quality. However the mistakenly locked decoders start to converge to the proper value. Hence the quality decreases.

The easiest way to fix this mess is to not allow the "slower" decoders to decode before the fast decoders are ready. That is: require that locks can only be acquired in the order phase, seconds, minutes, hours, days, months, years.

I think this would fix it. There is a catch though. The current approach allows for significantly faster startup. I think I will fix it by introducing another configuration flag. What do you think about this proposal?

With regards to using the weekday as some kind of checksum this will unfortunately not fix it. The reason is that if your signal quality is such that this issue happens then introducing this as a checksum will fix it for 6 out of 7 days of the week. The rest there will be still this issue.

There is one more thing that I could do: I could introduce additional "flat out detection bins". The price to pay is 1-2 additional bytes of sram consumption per decoder. Thus about 12 additional bytes. This would also help to relieve this issues.

Thus the action plan would be as follows:

1) Add a flag "optimistic decoder" which toggles the way the decoders are working. TRUE = as currently implemented, FALSE = do not allow the slow decoders to decode before the fast ones are at least marginally ready.

2) If the flag is TRUE use the weekday information for faster transition to "synced", otherwise require improved consistency.

3) Add a flag "flat out detector". If FALSE go ahead as today. If TRUE add additional detection bins for flat out 0 and flat out 1. (but not for minutes or hours, for those only for flat out 1). If the decoder locks to the flat out signal then void all successive decoder stages and do not allow to sync.

What do you think about this approach? In particular about the choice of the default values? Should I default to maximum robustness (paying with more memory and slower initial sync) or should I default to the optimistic setup and require that people with extra poor reception need to adapt the configuration?

udoklein commented 6 years ago

@nameoftherose By the way: what did you use to plot the log statistics?

udoklein commented 6 years ago

@lakeroe If you send me your hardware then I also might be able to gain more insights. Did I understand it right that you are within <300 km from Frankfurt?

One more question: why are you putting the antenna intentionally into a poor reception position? This is a very interesting approach but I would assume that the antenna is mounted in an optimal orientation and that the library will only have to tackle "other noise". What you are doing is basically raising the noise floor by more than 10d dB, maybe even 20 dB. This is great for testing but close to Frankfurt it is somewhat pointless. So I am wondering why you do this?

BTW: together with the analysis of nameoftherose I think this finally shed some light on why some people reported issues with my library that I was not able to reproduce. Obviously your test approach is a good idea :)

udoklein commented 6 years ago

I analyzed the log and with the hints of the picture by nameoftherose the plot thinkens. Your decoder module is biased to all 1. Thus during periods of bad reception it shows a different behaviour than mine. Mine biases to 0. This also explains why I was never able to figure this out. My module is biased differently. I will fix the library. However I am very busy right now. I can not promise that I will be finished in June. This might take some weeks.

nameoftherose commented 6 years ago

The graph is created as follows:

  1. minicom creates a timestamped log
  2. the log is proccessed with awk -f qf.awk to produce qf.csv
  3. qf.csv is opened in gnumeric.
  4. qf.gnumeric contains links to qf.csv, when opened the graph is shown.
  5. the dates in cells A1, A2 are the graph limits, have to be adjusted. This could be done in any spreadsheet.
nameoftherose commented 6 years ago

@udoklein Yes I think your action plan is very promising. Take your time. Thank you.

lakeroe commented 6 years ago

@udoklein Regarding your questions

*) I think your plan sounds promising and we should give it a try. Once the new software is available for sure I can run further tests and post logs.

*) Default values I would go for maximum robustness because I think it's more important to have a reliable time than to save some bytes on RAM and faster sync.

*) My hardware Due to the newest findings do you think it's still necessary to send my hardware to you ? If yes, another possibility is to by your own hardware. It's just so cheap and probably the same price than sending mine to you and back. The hardware I use is Pollin modul 5,45€ + 4,95€ (shipping) STM32F103C8 blue pill board 1,85€ from china Is that okay for you ?

*) Distance from Frankfurt According to Google Maps I'm 407km away (in Austria near Salzburg).

*) Antenna orientation I put the antenna in a poor reception position to get a quicker result. If it's in an optimal position I might never reproduce the drifting time problem ...

*) Time schedule No hurry and take your time. It's finished when it's finished !

udoklein commented 6 years ago

Well, I used a Pollin module for my tests --> in my location it is differently biased. As you say most probably I will gain no insights from your hardware. However you could do me a favour. Now that we know what is most likely the issue it would be nice if we can create a log file which will reproduce the issue with my standalone debug helper. Unfortunately the log file with the issue does not. The reason is that the log does not capture the full information prior to the first sync.

There would be two options: 1) I synthesize this information 2) Hopefully easier: you create another log just the way you did when you reproduced the error. But this time instead of turning the antenna after the first sync give it another 20-30 minutes to acquire a better lock, then turn the antenna and wait till the issue reoccurs.

According to the theory (2) should still reproduce the error but it should also capture enough signal such that the standalone debug helper will also recreate the issue. This would help me a lot during testing.

Would this be possible for you? I am not in a hurry. If this takes 2 or 3 weeks it would still be in time for testing.

lakeroe commented 6 years ago

Sure, I will create another log and report back ...

lakeroe commented 6 years ago

Please find another LOG attached (in Dm mode). This time both date and time was wrong ... Maybe @nameoftherose is kind enough to generate another nice looking graph ...

@udoklein Could you just have a quick look if it fits your need ?

dcf_2018-06-05_ARM.zip

nameoftherose commented 6 years ago

qf I would not say the clock is drifting, the algorithm decodes the wrong time (line 638370) and syncs to it. Synchronization is then lost and reacquired, this time correctly (line 658922). This log confirms @udoklein analysis.

udoklein commented 6 years ago

@lakeroe - Excellent. The file is does reproduce the issue with the standalone debug helper. As it now turns out even the first one did but I mistakenly used a wrong parameter setup. Sorry for causing duplicate work. However on the upside I have now more test data :)

With regards to the plots I added a log_2_csv converter in the latest commit here: https://github.com/udoklein/dcf77/blob/master/extras/standalone_debug_helper/log_2_csv.py
The tool has a command can convert logs to csv files. Then any plot program should be able to consume it. Thanks to @nameoftherose for bringing up this idea. I still feel somewhat humbled that this did not occur to me earlier.

Attention: I also changed the log format slightly. Thus it does not work with version of the library older than v3.2.8. In case of logs written with previous versions of the library all occurences of "MEZ" must be replaced by "CET" and all occurences of "MESZ" must be replaced by "CEST" before converting log files.

udoklein commented 6 years ago

By now I think I will patch

static void process_single_tick_data(const DCF77::tick_t tick_data)

The point is that in

static void sync_lost_event_handler()

I wipe the slower decoders once a sync is lost. But then I do not prevent them from gaining "quality" even if the fast ones fail to do so. The point is that this situation is with exceedingly high probability caused by biased noise. If the noise is random, then this can not really happen.

As a consequence I would not need an additional flat out detector at all. The fast stages already provide this information.

I am now pondering what happens if the fast stages have very high quality and the slow stages are not yet there. Now if biased noise enters the clock then I can not detect it and it will again pull the slow stages into this undesired state.

Of course I could add a check for "jumps" in time. But then what to do? Maybe reset the clock.

I am still thinking. The implementation of the fix will be much easier than finding it.

Probably a good point in time to "reset" the slow decoders would be second 3. This is buried in the weather data an no other processing is going on during that second. It is also far enough away from anything else to cause any kind of interrupt congestions.

Idea would be look at the "minute decoder quality" before any of the successive controllers gets hold of any data. if the minute controller is to poor then the successors should not get any data at all. Instead they should be reset.

To optimize this from a performance point of view either the clock controllers needs to understand when the decoders are actually processing bits. Fortunately

    `template <typename signal_t, signal_t signal_max, uint8_t signal_bitno_offset, uint8_t significant_bits, bool with_parity>`

allows to infer this already at compile time. Unfortunately it is not available outside of the class. Seems I need an additional constant or enum definition to make this accessible to the controller. Then the controller can dispatch the resets as needed.

udoklein commented 6 years ago

On the other hand maybe a flat out detector could leverage the timezone bits + the 1 minute digit bit. Thus every two minutes it should see 3 ones and 3 zeroes. If this is not the case then the signal might be biased. Now I need to figure out how to prevent it from a "random walk". Probably an exponential filter approach would be good. This raises the question for a suitable filter constant.

udoklein commented 6 years ago

With regards to the flat out detector it might be actually better to not look into any of the decoded bits. Probably it would be best to look into the raw signal. E.g. it might be a good idea to look at the first 100 ms and the 500-600 ms interval. The pattern might be

0 0 --> flat 0 1 1 --> flat 1 0 1 --> drowned by noise 1 0 --> everything the way it should be

This would have the significant advantage of a very short latency for detecting flat outs. This leaves the question what would be a good fitler constant. --> Need to analyze the data of the examples.

udoklein commented 6 years ago

Release v3.2.9 introduces some slight change in the way the decoder templates are parametrized. This "optimization" is part of the preparation for the fix of the current issue.

udoklein commented 6 years ago

I am right now testing v3.2.10. Here are some results if I feed the logs lakeroe into it.

01_dcf_2018-05-15_avr_v_3_2_10 02_dcf_2018-05-18_arm_v3_2_10 03_dcf_2018-06-05_arm_v3_2_8 03_dcf_2018-06-05_arm_v3_2_10 03_dcf_2018-06-05_arm_v3_2_10_seconds

udoklein commented 6 years ago

Picture 1 and 2 show the new library for the olders two files. Picture 3 and 4 show library 3.2.8 vs. 3.2.10 vs. the newest log file. As it turns out the library still struggles with this file. Picture 5 shows why. I took only every 60th sample. Thus the second should be stable. However it is not. The library acquired a phase lock but the second is drifting. What ???????

I double checked the original log file. This is not a bug in the library. This is the hardware which is causing the issue. Actually a one point between 86000 and 88000 seconds its drifts by almost 1 second in 500 seconds or 2000 ppm. Way out of spec.

I would like to point out two things: 1) In the last case there is either some hardware flaw with the local oscillator or there is some other code running that messes with the interrupts. 2) Because of the fix the new library takes longer to actually lock to a signal. Thus if there are only very short periods with a reasonable good signal, then it may not lock to the signal at all. There is not a lot that I can do about this. The library is statistical in nature. Everything that I do to reject bad signals will decrease the chances to accept correct but noisy signals. I hope the change is an overall improvement.

Please test and provide feedback.

udoklein commented 6 years ago

If you are testing either use v3.2.10 or, if you use v3.2.11 make sure that you set https://github.com/udoklein/dcf77/blob/v3.2.11/dcf77.h#L110 to the value used in v3.2.10. E.g.

enum controller_minute_quality_threshold_t : uint8_t { aggressive_minute_quality = 0, standard_minute_quality = 2, conservative_minute_quality = 4, paranoid_minute_quality = 6 }; static const uint8_t unacceptable_minute_decoder_quality = controller_minute_quality_threshold_t::conservative_minute_quality;

brettoliver commented 6 years ago

Hi Udo. Thanks for the update. I am in the south of England. This time of year once the temperature rises I get very bad signal reception in the afternoon into the evening.
I have 9 DCF77 clocks running off 1 Aerial via a Digital repeater so they all get the same signal. One clock runs library version 2 and never has any issues the other clocks run the old version 3 library. One of the clocks has a problem when the signal gets bad over a long period and losses Sync showing Lock on the display. Eventually the time and date go way out and the clock never returns to sync even with a good signal.

I have now loaded this clock with you new library 3.2.10 and will see how it goes. The DCF77 Signal is very bad at the moment so I expect a long wait for sync but I don't mind this if it fixes the Sync issues.

I'll let you know how I get on.

Regards. Brett.

udoklein commented 6 years ago

Hi Brett, if you could record one or two days with "Swiss Army Debug Helper" in mode Dm, then I could analyze what happens. This is the only way for me to get real world noise. As it becomes more and more obvious my simulator does not simulate the real world situation. Hence I require these logs to improve the library further. Regards, Udo P.S. Which exact version 2 are you using? Maybe I can spot the difference in the code and port the different behaviour to version 3 again.

brettoliver commented 6 years ago

Hi Udo. I will be moving the clock that has sync issues in the next couple of weeks. I will set it up near my PC and record the data for you. The version 2 code I use on my Master Clock is v2.0.4. I think this has been running for over 4 years now with no errors. Regards. Brett

brettoliver commented 6 years ago

Hi Udo. My clock has now synchronised after 8 hours of intermittent bad signal. See my signal here 1 = 100% 0 is <100% http://home.btconnect.com/brettoliver1/Geiger_Counter/Charts.htm

The only problem is the clock has synchronised to the wrong time. The seconds are correct but the time shows 06:09:45 20/04/2000 not 20:09:45 30/06/2018.

I would normally run an EEPROM erase program at this stage to get the clock to sync correctly. http://home.btconnect.com/brettoliver1/Arduino_EEPROM_Erase.htm

What do you think? Brett.

brettoliver commented 6 years ago

Have run EEPROM Erase as clock went to Lock. Clock has now syn'd correctly after only a few minutes.

udoklein commented 6 years ago

Hi Brett, I have no idea why erasing the EEPROM would help. Maybe this is just coincidence, maybe not. I definitely need log files to gain more insight. I pushed a new version of the library (v3.2.12) with slightly enhanced logging. If you could use this and create some log files maybe I can gain more insights. Regards, Udo

brettoliver commented 6 years ago

Hi Udo. I nearly always have to erase the the EEPROM once any of my clocks have lost sync. I presumed the learned "settings" were stored in EEPROM and got reloaded when the Arduino was restarted. Once these "settings" are incorrect I find my clocks won't sync to the correct time even after reset. Once I do an EEPROM erase the clocks always re-sync correctly.

I will be able to use your "Swiss Army Debug Helper" now as I have found a long USB cable to reach my clock (it's a Longcase Clock so I can't move it easily).

I have loaded the "Swiss Army Debug Helper" into my clock and sent "Dm" to the clock. I am showing a "scope" type output on the serial monitor.

How do I save a log file? Brett.

nameoftherose commented 6 years ago

@brettoliver «How do I save a log file?» It depends on the terminal program you are using, have a look at the command line option or the menu.

I had similar experiences in the past, it was actually herr klein himself who suggested erasing the EEPROM. His explanation was that due to the long period of no signal the frequency correction had reached its 400ppm limit. @udoklein may I suggest that static const int16_t max_total_adjust = 6400; (in dcf77.h) is decreased to a more reasonable value; In my case reducing the constant to 1600, corresponding to 100ppm crystal accuracy, worked.

lakeroe commented 6 years ago

Hi Udo,

I tried v3.2.11 (with default parameters) and ran my "usual" test (put the antenna in an optimal position, wait for the best clock_state possible and then put the antenna in a non-optimal position). After 33229 seconds the library decodes the wrong time ...

Could this be due to local clock drift you assumed above ?

Best Regards, lakeroe

LOG-File: dcf_2018-07-02_ARM.zip