jfitter / MLX90614

MLX90614 IR Thermometer Driver Library for Arduino
GNU General Public License v3.0
16 stars 13 forks source link

SetFIRCoeff/SetIIRCoeff clears config register #2

Open wormyrocks opened 8 years ago

wormyrocks commented 8 years ago

Hi,

I realized too late that using SetFIRCoeff or SetIIRCoeff in this library with an Arduino Uno zeroed register 0x05, which meant that I could still communicate with the device but I could no longer write to the EEPROM; it's effectively bricked. Unfortunately, this happened twice :( I'm guessing some differences in our host microcontroller BSP led to the config register bits not being properly cleared somehow, meaning that rather than only setting the three bits corresponding to the FIRCoeff all of the bits got cleared?

Also, if you have any suggestions on how to unbrick the sensor, I'm all ears. I already tried using the writeEEPROM function and write16 functions independently, but the EEPROM dumps continued to reveal that register's contents as 0x0000. (It read the other ones fine!) Oddly enough, address 0x00 was also set to 0x0000, but I don't think that is the one impeding my write access.

jfitter commented 8 years ago

I have studied the code very carefully and can find no errors. I have been using this code for a considerable time and have not found any misbehaviour, so your email came as a surprise to me.

The code that sets IIR and FIR coefficients is a simple read/modify/write as per the datasheet recommendations. It checks that the new value is in-range, reads the old value from eeprom, masks-out the relevant bits, ors-in the new bits, and writes the revised value to eeprom.

The only suggestion I can offer you is to check your code to ensure it has not accidentally written to the eeprom in a loop and exceeded the eeprom endurance. The manufacturer quoted endurance is 10,000 write cycles which is relatively low but quite sufficient for a device such as this. With a 5ms minimum write cycle time it is possible to exceed the eeprom endurance in just 50 seconds.

One way this may occur is if there is an address problem on the bus. Do you have other devices on the bus? Are you addressing the other device and inadvertently getting writes to the Melexis?

It does surprise me that your eeprom registers are zeroed. Random writes would leave garbage. Repeated random writes to the endurance limit could leave zeroes but you would need to do a lot of writes.

Do you have another bus device that is taking writes to addresses 0 and 5?

If you can read the bus address register 0xE does it contain the value that you expect?

The default is 0x5A

Does it conflict with another device?

Is your bus working properly?

Test it with another device of a different kind, such as a RTC (or anything cheaper than the Melexis).

The manufacturer explicitly warns against putting more than one device on the bus having the same address.

Right now I can't think of any other suggestions because it has worked flawlessly for me. Even if the config register is zeroed, provided it can be written to it should be possible to get the chip working. Of course its calibration will be off, but it will work. You would need to ensure it was the only device on the bus and address it with a slave address of zero, the broadcast address. Write a default value to the config register. If it works, good. If the register remains at zero then it is broken. A reasonable starting value for config would be 0xBF99 and set SA to 0x5A

Kind regards,

JohnF

From: e sk [mailto:notifications@github.com] Sent: Wednesday, 10 August 2016 6:03 PM To: jfitter/MLX90614 Subject: [jfitter/MLX90614] SetFIRCoeff/SetIIRCoeff clears config register (#2)

Hi,

I realized too late that using SetFIRCoeff or SetIIRCoeff in this library with an Arduino Uno zeroed register 0x05, which meant that I could still communicate with the device but I could no longer write to the EEPROM; it's effectively bricked. Unfortunately, this happened twice :( I'm guessing some differences in our host microcontroller BSP led to the config register bits not being properly cleared somehow, meaning that rather than only setting the three bits corresponding to the FIRCoeff all of the bits got cleared?

Also, if you have any suggestions on how to unbrick the sensor, I'm all ears. I already tried using the writeEEPROM function and write16 functions independently, but the EEPROM dumps continued to reveal that register's contents as 0x0000. (It read the other ones fine!) Oddly enough, address 0x00 was also set to 0x0000, but I don't think that is the one impeding my write access.

You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jfitter/MLX90614/issues/2 , or mute the thread https://github.com/notifications/unsubscribe-auth/AGCsy_lOdzUbXMm5rIfH_3y7R fAfKaWQks5qeYWdgaJpZM4Jg4Yn .https://github.com/notifications/beacon/AGCsy3I4rQ1q8LGn1zPywyoxeW7oPIsvks5 qeYWdgaJpZM4Jg4Yn.gif

__ Information from ESET NOD32 Antivirus, version of virus signature database 13938 (20160810) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

wormyrocks commented 8 years ago

Hey,

I am only communicating with one device at a time. I have confirmed that it remains at 0x5a.

I hadn't thought of the read/write cycles limitation; I don't think that was part of the issue but I'll double check my code and make sure there's no way an eeprom operation could have made its way into the main loop. I copied the eeprom dump from before clearing the config register, so I have the exact value that needs to be copied back in. I've also tried modifying your code to overwrite only that register to its initial value. I haven't tried to touch any other registers yet; I'll test out some unimportant one to make sure that they're really all locked. When the EEPROM dump runs, I get a NACK error message after each line, but it still reads out every default EEPROM value (except for 0x00 and 0x05, which appear as 0x0000.)

So, my Arduino can still clearly read from the device; as it prints out temperature data and configuration register values. However, it's of no use since overwriting 0x05 threw the sensor calibration out the window.

When I get home from work I will try overwriting that register with a slave address of zero, something I haven't yet tried. I've read through the datasheet, but could you possible give me the one-sentence explanation of why that's necessary over using the default SMBus address?

Thanks very much for this thoughtful reply and taking the time to look into your code.

Edit: I also remembered that the only other thing I did that could possibly be considered out of the ordinary is change the read16() method to public in MLX90614.h so that I could read the raw bits in my sketch. I doubted this would break anything, but perhaps it prevented your program from reading the original value of the configuration file. I will also change that back before my next attempt.

wormyrocks commented 8 years ago

I reverted to the version of this library on your repository and I've tried with two different Arduinos. Both times, using the writeEEPROM function on a register with something in it will zero that register. I have my registers all backed up so I can restore everything once I figure out what's going on. I'll try more stuff after work today. I suspect that either something is wrong with my sensor hardware, or the Wire library has been updated in the past year in some way that breaks compatibility. I'll try and reprogram the registers with a Bus Pirate and let you know if I find anything interesting.

jfitter commented 8 years ago

Hi

When the driver writes to eeprom it first reads the current value to determine if it needs to do anything.

If the value is the same or there is a r/w error then it does nothing.

Otherwise it zeroes the eeprom then writes the new value.

If there is a r/w error then it does nothing more except to return the r/w error code to indicate that the eeprom is now probably corrupted.

Each time a write is done the code delays for Terase (5ms) to allow for the eeprom write to take place.

It is very possible that you are not getting past this point. The read is successful so the code progresses to the write and writes zero to the eeprom. If the write routine then returns a r/w error then nothing more will be done and you will be left with zero in the eeprom.

You need to write a test program that returns the value of the r/w error and prints it out. It must always be zero for the code to succeed.

There is a property getter for the r/w error called rwError. You just treat it like a variable, so Print(rwError) will print the error code, which should always be zero.

The r/w error is persistent which means it is zeroed just before each read and write operation. It maintains its last value until the next r/w operation.

The header file lists the r/w error bits. Check to see which bits are set. If bit 7 is set then the write failed because some other r/w error was returned by the Wire.endTransmission function. I need to know what that other error bit is.

Wire.endTransmission is used in all of the read/write functions, so for any of these functions to work at all then Wire.endTransmission must be working and returning r/w error code of zero, as expected. You can read successfully but cannot write, so I am keen to know what the errors are that are being returned.

* Have you got pullup resistors on the I2C bus lines? * There must be some explanation as to why your writes do not succeed but the reads do work.

Note also the 5ms delay. This is Terase per the manufacturer's documentation. It happens after every write. If insufficient time is allowed after a write then an error will be returned. Check that your clock speed is as expected and that your code is compiled for the proper clock speed, ie. make certain that delay(5) really is 5ms. For example, if you have selected an 8MHz board and your clock is really 16MHz then delay(5) will really only be 2.5ms and all of your writes will fail while the reads will work correctly.

You could also try commenting out the rwError test lines - ie. forcing the code to run regardless of the errors that are returned.

I hope this helps.

From: e sk [mailto:notifications@github.com] Sent: Friday, 12 August 2016 7:07 AM To: jfitter/MLX90614 Cc: John Fitter; Comment Subject: Re: [jfitter/MLX90614] SetFIRCoeff/SetIIRCoeff clears config register (#2)

I reverted to the version of this library on your repository and I've tried with two different Arduinos. Both times, using the writeEEPROM function on a register with something in it will zero that register. I have my registers all backed up so I can restore everything once I figure out what's going on. I'll try more stuff after work today. I suspect that either something is wrong with my sensor hardware, or the Wire library has been updated in the past year in some way that breaks compatibility. I'll try and reprogram the registers with a Bus Pirate and let you know if I find anything interesting.

You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jfitter/MLX90614/issues/2#issuecomment-239292834 , or mute the thread https://github.com/notifications/unsubscribe-auth/AGCsy53cfxVZJQrH78J2nTHMM mJ3uTMSks5qe48FgaJpZM4Jg4Yn .https://github.com/notifications/beacon/AGCsy3mwYczrpan3y3dCO7Sf_kNuB9vNks5 qe48FgaJpZM4Jg4Yn.gif

__ Information from ESET NOD32 Antivirus, version of virus signature database 13948 (20160811) __

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

wormyrocks commented 8 years ago

Thank you!

I managed to fix the afflicted registers by writing to them with the Bus Pirate. But now that I know that mistakes of that nature are recoverable I will throw in some debug printouts and see what is causing that error, if there is one.

_rwError gives me 0x42 -- TXADDRNACK and EECORRUPT.

I think I see the problem. The attached output is the result of the statement: "mlx.writeEEProm(0x01, 0x62e3);"

screen shot 2016-08-11 at 10 34 33 pm

This is still the initial read, but I believe that the first byte in the logic sniffer output ought to be 0xb4 (0x5a << 1) rather than 0x00 - or at least I have had success with that in the past. Perhaps using 0x00 as the broadcast address works for reads but not writes. Doesn't explain why write16() is able to zero out the EEPROM in the first place, though... watch this space, I'll do some more digging.

wormyrocks commented 8 years ago

Ah, I see what happened now.

Setting up the MLX object with MLX90614_BROADCASTADDR allows reads to work without an error, which is why most of the functions work. When writeEEProm zeros out the register, it successfully does so, but Wire.endTransmission() throws an error, since it should be writing to the I2C device address rather than the broadcast address. This means that the function never gets to the write data step, so the register is never properly overwritten.

Setting up the class with I2CDEFAULTADDR rather than BROADCASTADDR fixes this for me.

bulentperktas commented 7 years ago

this fixes for me in write16 function,,

rwError |= (1 << Wire.endTransmission(true)) >> 1;

instead of;

rwError |= (1 << Wire.endTransmission(false)) >> 1;

romik2206 commented 7 years ago

Yes, this correction is really really important. This mistake in library put my sensor to shitty state. Command setIIR or setFIR clears config register 0x05 and I did not know what there was. Really stupid mistake. I spent three hours to discover originally value and I am not sure of meanings two bits in config register.

0 positive signs Ks, 1 negative 0 - positive signs Kt2, 1 negative signs

Do anybody knows what does it mean?

Thank you.

jfitter commented 7 years ago

romik2206 Emotional and downright offensive responses contribute nothing to this technical discussion. When you get your Nobel Prize for programming you can call everyone stupid. Until then control yourself and get some perspective. This is a $5 gizmo, not your life savings.

jfitter commented 7 years ago

Just fixed the problem. Using the broadcast address will still throw an error, which can be safely ignored, but it will not prevent writing of the data to an eeprom register. Setting Wire.endtransmission(true) will also fix the problem for valid addresses. In order to write to an eeprom register it was necessary to clear the register first by writing zero to it, before writing the new data. This leaves a short period where the register is effectively corrupt. The previous code tested the success of the clearing register write and only wrote the data on success. Under some circumstances the write failed and the new data was never written. The new code writes zero then, without testing the success of this it writes the new data. Only then does it return the success of the operation.