Open colinl opened 1 year ago
I have done some more testing, and determined that the symptom remains absolutely consistent. In node-red-mcu/nodes/rpi-ds18b20/rpi-ds18b20.js the relevant code is
const items = this.#bus.search();
for (let i = 0; i < items.length; i++) {
const id = items[i];
this.#sensors.push(new DS18X20({bus: this.#bus, id}));
}
}
I can see that it is getting to this code, but the search is not finding any sensors. If I add a line of code after the for loop, so it now reads
const items = this.#bus.search();
for (let i = 0; i < items.length; i++) {
const id = items[i];
this.#sensors.push(new DS18X20({bus: this.#bus, id}));
}
trace(`18b20 items.length: ${items.length}\n`);
}
then the search does correctly find the sensors. Moving that line up to before the for loop causes it to fail again and it shows the length as zero. I hesitate to suggest that this is a build/compile issue, but I am having difficulty thinking what else it might be.
Is anyone able to confirm what I am seeing?
Surely mysterious. I did a bunch of work recently with the DS18X20 on ESP32 without a problem. The I/O is completely different on ESP8266, and much more sensitive to timing variations. But that doesn't explain your observations. Stupid questions:
I only have one sensor connected at the moment, it always returns 1 when it is working and 0 when it isn't. I think I have another sensor somewhere I will dig it out and wire it up.
Disabling Wi-Fi does not appear to make any difference either way.
Disabling Wi-Fi does not appear to make any difference
Thanks for trying.
I only have one sensor connected at the moment, it always returns 1 when it is working and 0 when it isn't
I wasn't clear. I was suggesting to try something like this:
trace(`search: ${this.#bus.search().length\n`);
trace(`search: ${this.#bus.search().length\n`);
trace(`search: ${this.#bus.search().length\n`);
trace(`search: ${this.#bus.search().length\n`);
trace(`search: ${this.#bus.search().length\n`);
Bingo! With the code looking like
const items = this.#bus.search();
trace(`search: ${items.length}\n`);
trace(`search: ${this.#bus.search().length}\n`);
trace(`search: ${this.#bus.search().length}\n`);
trace(`search: ${this.#bus.search().length}\n`);
trace(`search: ${this.#bus.search().length}\n`);
trace(`search: ${this.#bus.search().length}\n`);
Then if I remove all the wires in the flow I get, consistently
Wi-Fi connected to "RedKite"
IP address 192.168.49.245
search: 0
search: 1
search: 1
search: 1
search: 1
search: 1
or if I add back the first wire I get all 1s If I then add back the next wire
Wi-Fi connected to "RedKite"
IP address 192.168.49.245
search: 1
search: 1
search: 1
search: 0
search: 0
search: 0
and if put all the wires back then
Wi-Fi connected to "RedKite"
IP address 192.168.49.245
search: 1
search: 0
search: 0
search: 0
search: 0
search: 0
So it is looking as if maybe there is a time window during which it works, and small changes move the window about. Whatever the wiring, if I step through that code one line at a time they always all work.
One thing I have just realised is that, since the ds18b20 node is not the first one started, parts of the flow are already running while this code is running, so that may be part of the reason for the variation in the symptom.
So it is looking as if maybe there is a time window during which it works, and small changes move the window about
Wild. I guess the question is, "what closes the window?"
When the stuff going on in the background in the other nodes finishes? Or starts? No idea really.
When the stuff going on in the background....
ESP8266 is cooperatively multitasked -- there's really just one thing running at a time and it cannot be swapped out. And there are just two tasks -- the project and Wi-Fi. An interrupt can take time, of course, but there's not much running there either (Wi-Fi is the main one, but you said turning that off didn't make a difference).
This is just ridiculous. I went back to the situation with just the first two wires in the flow connected, and I get (as I did yesterday)
search: 1
search: 1
search: 1
search: 0
search: 0
search: 0
If I comment out the last two trace(
search: ${this.#bus.search().length}\n);
lines then obviously I should get 1,1,1,0, but actually I get
search: 0
search: 1
search: 1
search: 1
So commenting out the last two lines affects what the earlier lines do. There needs to be some understanding of how that can be. Again this is absolutely repeatable.
Could this be something to do with trace()
calls, or the comms associated with that? I replaced the code with
const items = this.#bus.search();
let results = [items.length]
for (let i=0; i<5; i++) {
results.push(this.#bus.search().length)
}
results.forEach(e => trace(`search: ${e}\n`))
and now I get 100% success, whatever I do with the flow (so far anyway). I don't understand how though, the trace calls will affect the timing of the second and later searches, but I don't see how it could affect the result of the first search. Also I think that takes us back to the unmodified file (except for the forEach line) which itself did fail consistently on the first (and only) search.
with a 3k3 pullup resistor
Hi @colinl ! Could you please confirm this data? According to spec, a single DS18B20 demands a 4k7. A 3k3 might be too small & create issues on the bus... or is it a typo derived from the 3,3V ?
Hi @ralphwetzel Yes, I am using a 3k3 resistor. It is a long time since I looked at the spec, but if I remember correctly, one issue is that they are designed for 5V operation, and when you look at the worst case situation they are not guaranteed to work at all with 3.3V. In practice they do, however, but I have always used a slightly lower resistor in order to make sure the bus pulls high enough, particularly if there are long wires and several sensors. With only one sensor and short wires I suspect almost anything would do. I will certainly swap it for a 4k7 however, just to remove any uncertainty. If this were part of the problem I would expect it to give variable results, but I am finding complete repeatability here.
'been using low resistor values with DS18B20's for over a decade, however, always with 5V. They certainly work down to 2.7K or even 2.2K pull-up resistors IIRC and the low values are beneficial for long lines. The 3.3V could produce issues, though, because that means the internal capacitors that power the chips don't get nearly as much charge. The issue this causes could also very well be 100% repeatable because certain bus transactions require more power for the DS18B20 to send its data to the bus master. So it could run out of power consistently when it reaches a specific transaction. If I were you, I would connect the bus pull-up to 5V and see whether it changes the situation. IIRC the esp8266 are de-facto 5V tolerant and you have a current-limiting resistor there, so the chip's pad driver doesn't actually have to shunt that much current.
[quote] IIRC the esp8266 are de-facto 5V tolerant [/quote]
Are you sure about that? Opinion appears to be divided over the issue. I don't want to blow the chip up.
I am having great difficulty seeing how that could give the sort of issues I am seeing here, where changing the code in the lines following the bus access changes the result of the search.
I'm sure the Moddable folks would be happy to send you a couple of spare esp8266's, that's less money than about 1-2 minutes of their troubleshooting time in salary ;-). Using a 3.3K pull-up you're only putting 0.5mA into the pad, that's pretty much in the noise.
WRT how later lines could influence earlier ones... The code gets executed from flash, which comes in pages/lines that get cached. The processor only really executes from cache. So changes in code size can cause "distant" code to be split across cache lines where they formerly fit into a cache line. That can cause bus transactions to have delays introduced, or timings to be missed. The one-wire bus distinguishes 1's and 0's using the timing of an edge, that timing generation/measurement could be off due to cache misses depending on how it's implemented in the SDK. I'm not saying that any of this is the real reason for what you observe, I'm just providing some ideas for how changes in later code can affect earlier code...
Happy troubleshooting ;-)
OK, I have tried it with 5V and it doesn't appear to have made any difference.
I think this may need someone with an oscilloscope to see what is going on. Sadly I don't have one. I can see with a voltmeter that it is kicking the bus when it does the search, even if it fails, but that is all I can deduce.
I have updated node-red-mcu and Moddable to pick up stack handling improvements as described in #90 and the symptoms have changed somewhat. Now not only does the search succeed or not dependant on unrelated changes to the flow or code, but also whether the temperature read succeeds or not depends on the flow configuration. Again it is repeatable, a flow that works always seems to work, and one that doesn't work, always does not work.
Thanks for trying the latest. I didn't expect it would help here, but this problem is mysterious enough that nothing would surprise me.
My best guess at the moment is that this is a timing related bug. The OneWire stuff is very timing sensitive. On ESP32, the driver uses the RMT hardware block to achieve very precise timing. That block doesn't exist on ESP8266, so it uses synchronous GPIO with delays. The implementation is disables interrupts while doing that to avoid interference. But, there could be a bug or some timing that's just enough on the edge that some subtle environmental factor is tipping it. Comparing successful and unsuccessful transitions with an oscilloscope, as you propose, would confirm that.
I didn't write the OneWire stuff, so I'm not familiar with the details. I can take a look. My DS18B20 hardware is at the office (and connected to an ESP32 from our FOSDEM demos!), It'll be a day or so before I can get that to try.
FWIW – I'm assuming that this isn't a consequence of some bug in the XS JavaScript engine. I can understand why you might draw that conclusion from some of the evidence. My experience is that XS is (very) deterministic and conformant. There's a lot of testing and real-world experience to back that up. There could always be an undiscovered bug, of course, but it feels unlikely here to me.
I'm assuming that this isn't a consequence of some bug in the XS JavaScript engine
I am inclined to agree with you. The fact that it is manifesting itself in several different ways makes it much more likely it is a bus timing issue as you suggest.
When you find time to look at this, could you check it with more than one sensor too please? I find that when I add a second sensor it only finds one of them. They both work individually. With one sensor, with a flow that does work, it works very well. I am polling every 10 seconds and have had one running for several days. It is getting about 1 in 10000 read failures.
That's an interesting addition. Good to know that it is 99.99% reliable with one sensor. I only have one sensor connected in my set-up at the moment, so I'll probably start there. But, understood.
I have a couple ideas about what might not be optimal based on reading some other 1 Wire implementations (most relevant is the Arduino implementation). Even with a single sensor, I can try to more closely match their behavior and confirm it still works.
I finally had a chance to look into this. You'll be pleased to know it failed for me even with a single sensor. ;) I believe the reason for your intermittent failures was a good, old-fashioned uninitialized variable. That said, the DS18B20 sensors I ordered recently didn't work without some other changes. They do work on ESP32, which is where I tested this originally. I made changes to our ESP8266 implementation to match the Arduino OneWire library more closely and everything is working smoothly for me with two sensors connected.
All the changes are local to owb_gpio.c. I don't have a clean solution yet to commit, but if you'd be willing to give it a try to see if it is in the right direction, I'll post it.
Yes, please post the file and I will give it a try.
At my first attempt it isn't working, but I have had a busy day and brain fade has set in, so I may well have messed it up. I will have another go tomorrow.
Sure, thanks. Long week here too.
There is something different between our set-ups. Based on what I saw, I'm a little surprised it ever works for you. There are some parameters we can adjust if it doesn't work for you as-is.
FWIW – I updated the gist to revert to using the original timings. They are what Maxim recommends and are notably different from the Arduino implementation. No idea why. Since the Maxim values worked for you, maybe that will help. With the Maxim values, my two sensor set-up still works.
Bad news I am afraid. I went back to the latest moddable checkout ( Fri Mar 3 14:31, 746d1378311d), rebuilt moddable, did a mcconfig clean make and rebuild and with just one sensor confirmed that I have a working flow that finds the sensor. I then replaced owb_gpio.c with the version from the gist, rebuilt moddable (though in fact I don't think that is necessary for changes to this file is it?), mcconfig clean and rebuild and it doesn't find the sensor. I even removed everything else from the flow except the ds18 node,, including the mqtt config node and built it without without wifi and it still doesn't find anything. I have about a metre of cable on my sensor, but that should be nothing for a ds18b20.
@colinl – that is vey mysterious. It is extremely reliable in finding both sensors here.
Let us simplify a bit. Would you try running the OneWire example outside of Node-RED? That would make experimenting a little easier.
cd $MODDABLE/examples/drivers/onewire
mcconfig -d -m -p esp
You main need to change the pin number in the manifest to match your set-up:
"config": {
"onewire": {
"pin": "4"
}
},
I then replaced owb_gpio.c with the version from the gist, rebuilt moddable (though in fact I don't think that is necessary for changes to this file is it?), mcconfig clean and rebuild
Correct. You should only need to replace owb_gpio.c and build with mcconfig (no clean needed before).
I expect that the example will give the same OneWire result as running the flow.
I updated the gist to add a few traces (using modLog
) in the reset code, which is a first-order check to see if any devices are visible on the bus.
Once you have that result, try changing the call to modGPIOIniit
at the end of the file from an output to an input...
modGPIOInit(driver_info->bus.config , NULL, pin, kModGPIOInput)
...and re-running. Does that change the traces?
Thank you.
With the new gist it says no device on bus found 0
Changing the Init line to an Input appears to fix the problem, it works both with one and two sensors. [Edit] And it works in the node-red environment too :)
Excellent! Thanks for running those tests. We're circling in on a solution. I'll probably have a change or two more to ask you to try in coming days to try to wrap this up.
I modified modules/pins/digital/esp/modGPIO.c to more closely match the Arduino ESP8266 settings. The main difference is the initializing of the open drain flag on inputs, which is noteworthy because 1Wire requires open drain. So.... the owb_gpio.c stays as you have it -- initializing the pin to kModGPIOInput. Try the modGPIO.c, please. In theory everything should just work the same for you. It works reliably here. But... there's clearly something different about our set-ups so anything is possible. Thank you.
Working well still with new modGPIO.c
OK! One last change to try. This backs out the direct register modifications in owb_gpio.c. While slightly more efficient and precise on timing, it is also not at all portable. If this still works for you (fingers crossed), I'll clear out the commented-out code and commit. Thanks again.
This backs out the direct register modifications in owb_gpio.c.
I am confused. The gist linked to appears to be modGPIO.c, not owb_gpio.c.
Apologies. owb_gpio.c is further down the page. Here's the direct link for that.
Oh yes, I didn't look down there. All looking good so far. I will exercise it further with some combinations of hardware in the morning. It is late here now.
Actually I think we have gone backwards with this one. I am seeing regular mis-reads of the sensor - the family and id are correct coming out of the node but the temperature is null. Also one of my D1 Minis, with its own sensor, appears to be intermittent finding the sensors. Can you put back the previous version of that file (sorry, I should have kept a copy) so I can check that it really is that file causing the problem, and not the phase of the moon or something similar.
The Gist maintains all the old versions. Just click "revisions" in the top-left corner to see them. Use the "..." menu on the right of each file to select "View File" to get the full copy.
Sorry about the delay, I have had to do some lengthy testing to try to work out what is going on. I have two D1 Minis wired up, one with one DS18B20 and one with two. With the latest gist versions one of my sensors is fine (the one wired on its own), one occasionally gives a null reading for the the temperature, and one gives a null reading about 50% of the time, though sometimes it runs ok for tens of minutes at a time, which has made it rather tedious to analyse. With the previous gist of owb_gpio.c, with the init line at the end changed from output to input, they all appear fine. I am just in the process of re-running that for a few hours to make sure it is still ok.
Scrub my previous comment, I think the latest gist is ok, at least at the moment. I had a screw loose, literally, though at some points I was beginning to wonder whether it was metaphorically. Ludicrously it did appear that with the loose wire it still worked reliably with one version of the file and not with another. I have to do further extended tests, but so far it is looking ok.
Aargh! Perhaps I have got a metaphorical screw loose. It is failing again. There must be a pattern here, but I haven't convinced myself what it is yet.
An update before I give up for the night. I don't think there is any noticeable difference in performance between the previous owb_gpio.c (with the modified line at the end - output to input) and the latest gist. Two of my sensors work perfectly as far as I can tell. The third one seems to have marginal timing when measuring. It is always found okin the initial search, but sometimes the read returns temp: null
. The thing that has confused me so much, and took me some time to realse, is that it works almost 100% when run in xsbug, but if I close down xsbug and power cycle the D1 mini then it fails a lot, about 50% on average, sometimes for minutes at a time. Does that make any sense?
It could be a dodgy sensor, there are apparently a lot of clone devices out there, but it has been working for years, in parasitic mode, attached to a 1-wire dongle with several other sensors.
The drama!
One question (you may have answered already, apologies). You mention parasitic power. I don't believe I'm doing that. Would you share your wiring diagram so I can try to reproduce that?
Regarding xsbug, if anything I would expect the environment to be more friendly without xsbug as it eliminates serial traffic.
I haven't been trying to use parasitic mode on the ESP device, I don't know whether one would expect to be able to use parasitic on 3.3V, I suspect not. I use it in a system where I have a 1-Wire dongle, which has a 1-wire chip in it. I didn't have much choice but to use parasitic as I wanted to re-use existing tv aerial co-ax distributed through the house, as a 1-wire bus. I was amazed when it all worked perfectly. With parasitic one has a pullup from the signal to Vcc as normal, but only the signal and 0v are fed to the sensors. At the sensor, the 0v and Vcc connections are both connected to ground, which tells the device to power itself from the signal lead when it is quiescent, I think. I am not sure where to go now. I have one device that is apparently timing critical (or faulty) and two that work well. All my other sensors are wired in so I can't test with them. Other than buying some more sensors which I don't need, in order to test it, I don't know what I can do.
Probably I should try and build an arduino sketch reading the sensors and see how well that works. I haven't done that before though, so will need to do some research first.
I found an old sketch I had used with the sensors, I had forgotten that I had done that. I ran it all night and it did not fail once! So there must be a difference still. When I was working I could have borrowed a scope to try and see what was going on, but I haven't got access to one any more.
OK! Let's try to close the loop on some of this. I've committed the GPIO and 1-Wire changes. While perhaps imperfect, they are an overall improvement. This will propagate out to the Moddable SDK on GitHub in the next day.
Having read several 1-Wire implementations, including the Arduino code, and the Maxim app note, I think the updated implementation is consistent with the requirements there.
Regarding the failures when not using xsbug, I would like to understand that scenario more completely. There should be no physical coupling between the serial pins and the 1-Wire pin. When you say you run without xsbug, could you explain the differences between the two runs? For example, is it still a debug build or is it an instrumented or release build? What is connected to the serial TX and RX pins? Is the device powered in a different way? Thank you.
I have two sensors connected to the D1 Mini, connected to a USB port on the laptop, running the flow below. I built and ran it with -d so that it opens xsbug and the debug is shown there, and I can see that sometimes one of the sensors returns an object with temp set to null.
Every 10 seconds it sends a value for each sensor to mqtt. I am catching those and sending them to Influxdb and charting with Grafana. If a temperature value of null is received then I convert that to 0 for sending to influxdb. The attached screenshot shows the result. There are two lines, blue and yellow. The yellow is the good sensor and by hiding the blue line I can see that it never missed a beat.
It can be seen that up to 19:30 the blue line shows the correct value, with intermittent null values.
At 19:30 I shut down xsbug, but carefully did not touch anything else, physically or s/w. It can be seen that thereafter the blue line is mostly zero, only occasionally showing the correct value. This effect is consistent, it works better when xsbug is connected.
During the time of the chart the laptop was not being used for anything else, in fact there was no-one in the room except when I went in to shut down xsbug.
[{"id":"66719b6a9b015085","type":"tab","label":"From mcu-ds18b20","disabled":false,"info":"","env":[]},{"id":"6eadb6ca2404b095","type":"mqtt out","z":"66719b6a9b015085","name":"","topic":"","qos":"0","retain":"false","respTopic":"","contentType":"","userProps":"","correl":"","expiry":"","broker":"75fb29423f8a0770","x":690,"y":240,"wires":[]},{"id":"1d5615f8f44c5fef","type":"debug","z":"66719b6a9b015085","name":"debug 86","active":false,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":460,"y":320,"wires":[]},{"id":"d0245f367ce16581","type":"inject","z":"66719b6a9b015085","name":"Measure","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"10","crontab":"","once":true,"onceDelay":"3","topic":"","payload":"","payloadType":"date","x":100,"y":240,"wires":[["8eb62c664cc43647"]]},{"id":"8eb62c664cc43647","type":"rpi-ds18b20","z":"66719b6a9b015085","topic":"","array":false,"name":"","x":250,"y":240,"wires":[["ed6f27cbf7a83105","1d5615f8f44c5fef"]]},{"id":"ed6f27cbf7a83105","type":"function","z":"66719b6a9b015085","name":"Extract id and temperature","func":"if (msg.payload.id && msg.payload.id.length > 0) {\n msg.topic = global.get(\"ds18b20Topics\")[msg.payload.id]\n if (!msg.topic) {\n msg.topic = `tydwr/ds18b20/${msg.payload.id}`\n }\n return msg\n}","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":480,"y":240,"wires":[["6d589b80ee3b076c","6eadb6ca2404b095"]]},{"id":"725379ce0bebbac2","type":"mqtt in","z":"66719b6a9b015085","name":"","topic":"tydwr/ds18b20/topics","qos":"1","datatype":"json","broker":"75fb29423f8a0770","nl":false,"rap":true,"rh":0,"inputs":0,"x":120,"y":60,"wires":[["19c6e135a76911cf"]]},{"id":"6d589b80ee3b076c","type":"debug","z":"66719b6a9b015085","name":"debug 87","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":440,"y":120,"wires":[]},{"id":"19c6e135a76911cf","type":"change","z":"66719b6a9b015085","name":"","rules":[{"t":"set","p":"ds18b20Topics","pt":"global","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":390,"y":60,"wires":[["6d589b80ee3b076c"]]},{"id":"75fb29423f8a0770","type":"mqtt-broker","name":"Owl2 for mcu","broker":"192.168.49.83","port":"1883","clientid":"","autoConnect":true,"usetls":false,"protocolVersion":"4","keepalive":"60","cleansession":true,"birthTopic":"","birthQos":"0","birthPayload":"","birthMsg":{},"closeTopic":"","closeQos":"0","closePayload":"","closeMsg":{},"willTopic":"","willQos":"0","willPayload":"","willMsg":{},"userProps":"","sessionExpiry":""}]
In addition there is the fact that it runs 100% solid using the arduino codeon the same hardware, so presumably there must be a timing diffference there, or an issue with pin mode or something similar. Are you able to compare them with a scope, or do you not have access to one either?
The attached simple flow, with an inject node driving a DS18B20 node and on to a function node and MQTT Out, works correctly. The Wemos D1 Mini has a DS18B20 connected in the normal manner, 0V, 3,3V and GPIO4, with a 3k3 pullup resistor.
However, if I delete the wire to the MQTT node then the sensor is not detected. Using xsbug I can see that the bus search in rpi-ds18b20.js does not find anything. With the wire connected I can see that it does find the device, so I am looking at the right code. Making other trivial changes to the flow can also change whether it is found or not. I have done enough experimenting to be confident that it is not an intermittent hardware problem. The attached flow works every time and with a flow that does not work then it never works.
This makes absolutely no sense to me, and it has taken a lot of playing to convince myself that it is really happening, but I am convinced.