Open hartytp opened 4 years ago
Hi Tom,
could you please clarify what is wrong with temperature conversion. I looked at the code and it seems fine for me.
i.e. with the current code 0xC0 (0.75C) gives 1.5C. So, a 0.25C change from 0.75C to 1C would cause the temperature to decrease by 0.5C. Similarly, a change from 0C to 0.25C would change the calculated temperature from 0C to 1C. That likely explains the crazy statistics I saw...
Datasheet says:
Extended Temperature Registers (05h and 06h) These registers contain the extended temperature data from channels 1 and 2. Bits D[7:5] contain the 3 LSBs of the temperature data. The bit values are 0.5°C, 0.25°C, and 0.125°C. When bit 0 is set to 1, a diode fault has been detected.
As you can notice it stands that the bit values are as follows 0.5, 0.25, 0.125. So in fact if you have i.e. 0xC0 it means that you should add 0.5 and 0.25 C which is 0.75C in total.
As you can notice it stands that the bit values are as follows 0.5, 0.25, 0.125. So in fact if you have i.e. 0xC0 it means that you should add 0.5 and 0.25 C which is 0.75C in total.
Yes, if you have 0XC0 you should add 0.5C+0.25C=0.75C
.
However, that's now what the current code does....
total = (float) temp;
if (temp_ext & 0x40) total += 0.250;
if (temp_ext & 0x80) total += 0.500;
if (temp_ext & 0xC0) total += 0.750;
if temp_ext
is 0XC0
then total=0.25+0.5+0.75=1.5C
which is not correct.
i.e. the current code is not the same as
total = (float) temp;
if ((temp_ext & 0x40) == 0x40) total += 0.250;
if ((temp_ext & 0x80) == 0X80) total += 0.500;
if ((temp_ext & 0xC0) == 0xC0) total += 0.750;
or, more succinctly,
total += (float)(temp_ext >> 6)*0.25;
Extended Temperature Registers (05h and 06h) These registers contain the extended temperature data from channels 1 and 2. Bits D[7:5] contain the 3 LSBs of the temperature data. The bit values are 0.5°C, 0.25°C, and 0.125°C. When bit 0 is set to 1, a diode fault has been detected.
Where is that. In the datasheet I see
Where did you find the code that you posted?
Actually function looks different:
float max6639_get_temperature(uint8_t addr, uint8_t ch)
{
float total = 0.0;
uint8_t temp, temp_ext = 0;
vPortEnterCritical();
i2c_read(MAX6639_I2C, addr, MAX6639_REG_TEMP(ch), &temp);
i2c_read(MAX6639_I2C, addr, MAX6639_REG_TEMP_EXT(ch), &temp_ext);
vPortExitCritical();
total = (float) temp;
if (temp_ext & 0x80) total += 0.50;
if (temp_ext & 0x40) total += 0.25;
if (temp_ext & 0x20) total += 0.125;
return total;
}
The temperature sensors on each channel are a 6642, not a 6639
Ok, my bad ;) Give me a minute i will look at the code.
Searching the sources quickly, I can't see anywhere that max6639_get_temperature
is called, so maybe this is dead code?
@gkasprow do we use a 6639 anywhere?
@bjuszczyk the 6639 is used on the main board
@bjuszczyk AFAICT the MAX6639.c
is dead code that is no longer used. I think (from a quick look, but i might be wrong) that the temp_mgt
code is actually what controls the MAX6639.
Based on this code, it looks like we never actually read out the temperature of the chassis (MAX6639), only the channels. Not sure if this is an oversight or deliberate...
While you're fixing the 6642 conversion, it might be worth tidying up/documenting (and deleting dead code) the rest of the temperature management code because right now it's pretty hard to understand how it is supposed to function (this should ultimately also be documented in the Wiki)
Why is RF channel enable in the temp management code?? That seems like pretty unexpected behavior (functionality exceeds what one would reasonably expect from the function name)
The other thing I would comment here is that the moving average should probably be removed.
The noise on the temperature sensor without averaging should only be +-0.25C, which is fine. I suspect that the averaging was implemented because the noise was much larger than expected because of other bugs. It's better to fix those bugs than to try to hide the symptoms with averaging...
Ok, i have spotted similar conversion at the code that was implemented for MAX6639. That's why i was a bit confused. The section that you are pointing has a bug. I can implement it i a different way and that should solve the issue:
if (temp_ext & 0xC0) total += 0.750;
else if (temp_ext & 0x80) total += 0.500;
else if (temp_ext & 0x40) total += 0.250;
I don't know the reason for enabling the channels in this section.
I think that the average value is calculated for fan speed control. It doesn't affect the readouts.
I can implement it i a different way and that should solve the issue:
Yes, that would work. I still think this way is easier to follow though :)
total += (float)(temp_ext >> 6)*0.25;
I think that the average value is calculated for fan speed control. It doesn't affect the readouts.
Aah, I see. Yes, for the fans I can see why that would be helpful. Thanks for explaining!
I don't know the reason for enabling the channels in this section.
I assume it's just historical. It's not a major issue, but I think that overall the code could use a restructure. I suspect that the process of tidying up/restructuring would fix quite a few bugs. Some of the current code paths are quite convoluted and hard to understand making debugging tricky..
Yes, it could fix some issues. But still its very complex and extensive code, thus it requires massive amount of work to do this.
Ok, so i will commit new changes in a minute. Please verify it it helps for the temperatures issue.
Ok, so i will commit new changes in a minute. Please verify it it helps for the temperatures issue.
Thanks! Can you publish a firmware RC as well (in the releases section)?
Yes i can add RC. What number should i put for this version? Is there any algorithm or sth?
Good question. I'm not sure, you'd have to ask @wizath NB we should also update the fw version string to match.
We're currently on v1.4.1. i think there has been enough work done to warrant moving to v1.5rc1
or something like that
https://github.com/sinara-hw/Booster/issues/338#issuecomment-592215237