Closed megax closed 1 year ago
Thanks for the report. Do you think you could create a minimal program and steps to reproduce the problem that I could duplicate? I don't even have the ELF file to look at so I can't even begin to diagnose this directly.
I'm not seeing how anything could be NULL there (assuming there's no multithreading or concurrent things happening with the library, because that most definitely is not supported (if actually accessing the library concurrently)).
watchPendingConnect()
depends on conn_
not containing NULL, and that's checked just above.
I'm not 100% certain that line is where the error is.
Another thing I can think of is if this
is NULL. Are you trying to call something via a pointer to an EthernetClient
instance?
Update: Oh, hmm... then the check for conn_ == nullptr
above would also fail...
As far as I know, I don't use threads. I only use this callback code that I sent before: https://github.com/ssilverman/QNEthernet/issues/37#issuecomment-1552697604
This part of the description is incorrect:
if(_connecting == 2) {
setConnecting(-1);
callback(this, static_cast<int>(ConnectReturns::INVALID_SERVER));
}
Here I changed this to:
this->_callback(this, static_cast<int>(ConnectReturns::INVALID_SERVER));
I will try to make a small test program with this code. And to attach and describe the steps of the test.
I have so many questions, can I help you with .elf file debug? I have the file, can I run anything on it that might help? (until I finish the test program)
That link shows you’re using the Teensy41_AsyncTCP library. That doesn’t use QNEthernet (well, only the initialization code). Instead, it uses lwIP directly. I suggest asking the author of that library for help. The only reason that library claims to use QNEthernet is because it just uses the internal lwIP distribution.
Also, if you’re mixing QNEthernet objects (such asEthernetClient
or whatever) with use of that library, I can’t support that.
https://github.com/ssilverman/QNEthernet/issues/37#issuecomment-1550905882
First, sorry but I will be rude because it is offense me too. Please don't play this game over and over again. You helped me send a callback code, there is also an email about it. I had a real time problem, you helped me with it. That dns check also has a timeout. Read the whole conversation again. I'm seriously frustrated that you always write me off for using that crap. When I use your code because it's the BEST for Teensy. We also spoke online. Do you think I would use it?
If you are not believing me, online I can show you the source code.
Excuse me? I've spent a very very large amount of time creating this library for free. It's MIT licensed. You don't get to demand anything from me. I'm not writing you off; I don't support use of the Teensy41_AsyncTCP library in conjunction with any of the QNEthernet classes. If you'd like to try asking more nicely and if you paste a complete example program that's as short as you can make it without being too big, and if you provide easy-to-follow instructions then I'll try to look at it if I have time. I won't put up with demands; you're getting A LOT for free. I even spent time with you by video chat to try to help. I'm sorry that you're frustrated, but you need to try again. I'll give you one redo.
I wrote this library to meet my own needs and decided to give it to the Teensy community for free. My priority is my time and my projects, not yours, unless you pay me for my time, in which case, I'll prioritize working with you.
Yesterday I was pissed off because I had a feeling that you have not read my whole mail, only just the first 2 sentences. We already discussed many times before that I Am not using that crap Async. However, lately I got this answer from you. As I told you before, in a particle code there is the name of the async class. Which tries to set up the tcp connection to a different route. This is why the name stayed to keep the layer compatible. What I showed in that link uses your code so that I can use connectNoWait(). So, it does not hang on to the connection so my code can run continuously. This is not enough, because the getHostByName query would also wait. And this is the class based on your code which I discussed with you earlier. The name is what I use only, and it has nothing to do with anything else. I did not rewrite the names of the classes.
https://github.com/rickkas7/asynctcpclient/blob/master/AsyncTCPClient.cpp Only the structure is the same, but I don't have a Thread. Again, I do not have one! Only the callback you helped me write.
I'm sorry if I was rude or jerky, I don't think the specifics of my language are reflected in the translation. In my language, simple words do not in and of themselves cause sarcastic or sarcastic words. And I also used a translator and I don't think it understands the difference. In such cases, I was always pushed by the bad guys, that I can't give them what they want, and I might cause trouble, which is legitimate, because it's written, but I don't mean it. :( I wrote that at the beginning, sorry if I'm rude. I don't ask for anything that would take up your time or your help. I agree with this. Only that, although this is not an obligation either, because at least I'll give my truth anyway, just that I'm not using that crappy Async code. I'm sending these errors to help you and the community. If I get an error message, I don't wait for help, I look for the error. They never ask for help if I'm the one at fault.
Believe me, I thank you for your time. However, this is mutual, because I intended 1 whole week of working hours at the company to help your project and to try to find some clues to the error I found before. Not because of myself, because my boss suggested I look for something else. But because I thought your library was good and worth helping to find the problem. And because someone else has already run into that phenomenon. I dedicated my working time and free time to help. I'm an idiot in this field. I have already helped with many free projects, and I am not even willing to accept money for it. Not even if others need help. If nothing else, by reporting bugs, hoping I can explain it. And I help others with it. Believe me, I understand what it means to give something away for free. Man gives fingers and hands are torn off. However, in this case, the main problem is that I am unable to formulate it meaningfully for you. I am sorry for this :(
The way I read it, and it may have been because of the translator, was that you "didn't want to get that crap from me" or that "I was giving you crap" or something, but your latest response clarifies that you're referring to that library and not to me. I'm very busy and I didn't remember the exact contents of our last thread or discussion.
The most helpful thing you can do is, when you file an issue, pretend like there's no context and fill me in. I see now that you're referring to some other library (your own).
I do appreciate that you file bugs and participate in open/free software. I do the same thing. I have always tried my best to solve any issues in my own repos (including QNEthernet), and I'm more than willing to help you out. Anything you find improves the software for you, for me, and for everyone else. Again, maybe this was due to the translator, but you came across as demanding and insulting. Your latest response does clarify things a lot more. Thanks for that.
Can we start this issue again? I really need a desciption with complete context and a sample program. I'm just not seeing crashes at that line. I need to be able to reproduce it. I don't yet have enough information to even understand the problem so that I can tackle it. Does that make sense?
A thought: One of my latest projects was crashing all the time. It turned out I was just running out of dynamic memory. Might this thought be related?
Thanks again for the follow-up and clarification.
Again, sorry for the misunderstanding :(
I'll post code and test steps as soon as I can. I am still testing it. Do you mean RAM2 or the pre-allocated RAM_SIZE?
In principle, I don't run out of RAM2, how can I test the other option? how "full" it is?
On a Teensy 4.0 or 4.1, this should do it:
extern unsigned long _heap_end;
extern "C" void *_sbrk(int incr);
// Returns the amount of free RAM.
int freeRAM() {
ptrdiff_t end = reinterpret_cast<ptrdiff_t>(&_heap_end);
return end - reinterpret_cast<ptrdiff_t>(_sbrk(0));
}
I got the technique from browsing the Teensy forum. It's a mix of a couple of ideas from there. (Use of ptrdiff_t
is my own approach.)
Are you, by any chance, also using UDP? If so, are you specifying a queue size in the EthernetUDP constructor?
The answer to the other issue I am writing is yes. One of our projects requires that, so the code works in UDP. It also has TCP, but they are inactive (only TCP as EthernetServer). The EthernetClient (TCP) parts do not run due to settings. What I have experienced here in this issue is TCP based only. And I only saw the error occur in TCP. However, the answer to your question is no, I do not set it. I use the default one.
I am sending these only as information. Maybe it will also help you debug your project. If I can, I will help you.
This is how I start it:
_udp.begin(5000);
And this is how I send data (then it will be byte-based, now in $...*00 format):
char sbuff[80] = {0};
sprintf(sbuff, "$%lu;100;%lu;%d;%d;%d;%d;%d;%d;%d;%d;%d;%d", millis(), msg.id, msg.extended, msg.len, msg.data[0], msg.data[1], msg.data[2], msg.data[3], msg.data[4], msg.data[5], msg.data[6], msg.data[7]);
int bkey = calc_xor_crc(sbuff, strlen(sbuff));
char tn[8] = {0};
sprintf(tn, "*%02x", bkey);
strncat(sbuff, tn, 80 - strlen(sbuff) - 1);
_udp.send(_server_ip, _server_port, reinterpret_cast<const uint8_t *>(sbuff), strlen(sbuff));
How big is the UDP queue? I recently fixed a memory issue where it shouldn't have been preallocating each packet in the queue. That just sucked up memory like crazy.
I don't know where I have to see it. I describe how many messages I send. Maybe that will clear it up for you. 20-100ms average message sending. It actually transmits CANBus messages (it means 4-5 packets in 100ms). Between the two Teensies. Actually, I tried it as a 1ms timer, so I constantly run the send message what I sent earlier in the loop. Almost 100% of about 10,000 packages got there. The average was 9980 pcs. However, I think this is quite extreme which I tried it for.
I have no idea what your code looks like. It's whatever you pass into the EthernetUDP constructor.
Debugging this by text description just isn't working. Could you please create the simplest possible demo app that I can run here and debug? If not, I don't think I can help further because I don't have all the information. All I'm doing is guessing.
Did you print the freeRAM()
results every so often? You haven't answered my question about whether there's any RAM left.
yes, there is enough ram. i work on a test code what i can give you later. i try to reproducting the steps of the method of issue so i can tell you properly. i would ask for a couple days so i can finish the test code.
yes, there is enough ram. i work on a test code what i can give you later. i try to reproducting the steps of the method of issue so i can tell you properly. i would ask for a couple days so i can finish the test code.
Thanks for checking. I'm curious, how much is left? Is it kilobytes or tens of kilobytes or hundres of kilobytes free?
The best way to help me find these issues in general is to create a small test program (as small as you can possibly make it) that I can use to trace for the problem. Without that, it's just guessing and it'll almost always take much longer to get to a resolution. Although it's possible, It's rare that these things can be solved with a quick discussion without any code. Believe me, I wish that weren't the case.
Looks like you'll be right. Memory leak. I just didn't know exactly what I was doing until now. The error did not appear. I'm sorry. Reproduction: For one connection, the wrong port must be used. On which there is no live connection. (For me, it may happen if someone rewrites it in settings.) Invalid port: 2201 I may have messed up my own code, and in the case of a faulty connection, close() would be necessary. Although, it should close when trying to connect, right?
Additional information: I tried restarting the router. Then you won't run out of RAM. I hope it shows up in the log.
I am attaching the crash and memory leak log. Elf file as well (elf.txt please replace elf. Github... :(). And platformio.ini and test.cpp
Code info: I am attaching the sample code. There is a testClass class to model how I use the std::bind function as a callback. Not to cause an error, I will write it more simply in the static global namespace. I'll continue debugging. I hope you can see from the code how I use EthernetClient to avoid "delay", including dns queries. If you can come up with your own solution for this in the future, I will switch to it. At the moment, this is how I can do it, so that there is no "loop delay".
firmware.elf.txt crash log.txt memory leak log.txt test.cpp.txt platformio.ini.txt
Could you rebuild the project with the -g
build flag? Then the 'addr2line' program should print a valid value for your NULL memory access.
I just ran the code after changing the connection IP and port to something that's listening on my local network. I don't see a memory leak. In other words, I can't reproduce the problem. Once the connections connect and disconnect, the free RAM stabilizes about 16k less than when I started, probably due to the library's internal vectors allocating memory. Then it never goes lower.
I tried connecting to both an IP address and to a hostname, to rule out the DNS client losing RAM.
Could make sure you're using the latest QNEthernet? PlatformIO won't refresh to the latest unless you delete what's in .pio/libdeps/teensy41/QNEthernet. You can also verify the commit it's using by looking at the build output. You should see a line that look like this:
Dependency Graph
|-- QNEthernet @ 0.24.0-snapshot+sha.0825ded
That's the latest version.
Yes, latest:
Scanning dependencies...
Dependency Graph
|-- QNEthernet @ 0.24.0-snapshot+sha.0825ded
Building in release mode
Compiling .pio\build\teensy41\src\main.cpp.o
Linking .pio\build\teensy41\firmware.elf
I will test it today based on your request. And I'll report back.
I have a tip. I have 1.59 beta teensy core. Is it possible that there is something causing it for me? It's interesting, because with my normal code, I don't always experience this memory loss (but sometimes yes). But my test code runs out.
CrashReport:
A problem occurred at (system time) 19:51:57
Code was executing from address 0x2462
CFSR: 82
(DACCVIOL) Data Access Violation
(MMARVALID) Accessed Address: 0x0 (nullptr)
Check code at 0x2462 - very likely a bug!
Run "addr2line -e mysketch.ino.elf 0x2462" for filename & line number.
Temperature inside the chip was 54.22 °C
Startup CPU clock speed is 528MHz
Reboot was caused by auto reboot after fault or bad interrupt detected
Breadcrumb #1 was 296241892 (0x11A84AE4)
Breadcrumb #2 was 3679715279 (0xDB53FFCF)
Breadcrumb #3 was 134766600 (0x8086008)
Breadcrumb #5 was 1893073724 (0x70D6033C)
Breadcrumb #6 was 1655764204 (0x62B0F4EC)
PS C:\Users\JakosaCsaba\.platformio\packages\toolchain-gccarmnoneeabi-teensy\bin> .\arm-none-eabi-addr2line.exe -e C:\Users\JakosaCsaba\Documents\HariTech\Teensy\robot-tcp-test6\.pio\build\teensy41\firmware.elf 0x2462
C:\Users\JakosaCsaba\Documents\HariTech\Teensy\robot-tcp-test6/.pio\libdeps\teensy41\QNEthernet\src\internal/ConnectionState.h:28
I did a full clean. Teensy core 1.59 left. He's still doing it. Debug is active. Elf file attached.
Does it still crash with Teensyduino 1.58?
Line 28 of ConnectionState
is its constructor. There’s some other problem here.
I’m just not able to reproduce this crash with the code you provided. (I made a change to connect to a local machine and a change that writes a byte when it connected, so that the HTTP server I have running will close the connection.)
I’m also using Teensyduino 1.58.
I know I keep asking this, but I still don’t have code I don’t have to modify to reproduce the problem. I just can’t debug this without a working example program and I don’t have one yet. For example, I had to modify the code you provided so it could connect to a server. I want code that I don’t have to modify that reproduces the problem.
It wasn’t connecting to those “.hu” servers in the program.
I take it back. I wasn't able to connect to those servers before, but now I seem to be able to.
I think I found what's going on. First, connections don't look like they're ever getting closed in your code. This makes them accumulate under the covers. Second, I need to dive into why proper cleanup at the start of EthernetClient::connectXXX()
isn't being done.
I understand the time limit is 2 minutes. Until then, the server assumes that the connection is live. I don't know if that has any effect. All NTRIP servers work this way. This server I'm trying is not unique.
NTRIP: https://support.pix4d.com/hc/en-us/articles/4633640503709-What-is-NTRIP
I've been trying to understand what's been going on and I think I have an idea. If a connection isn't closed and you try to connect, it first calls close()
before the connection attempt. That means that any stray connections that haven't yet been closed by the system or network will hang around until they are. This will eventually exhaust memory if it takes a while for connections to be closed, for whatever reason.
Connections remove themselves from the pool when the stack thinks they're finished closing. This may or may not happen within the 2-minute window. I have more exploring to do for when things actually get closed.
The solution for now is to call abort()
before trying to reconnect with, in your case, connectNoWait()
.
Try the fix here: 9f33933 (latest commit as of now)
(No need to call abort()
with that fix.)
I tested it, and it works properly. Thank you very much. And sorry that the conversation went astray at the beginning. Thank you for your help!
It’s okay. We worked it out. :)
Thanks for reporting it and working through my questions.
Hello, I found a crash. Tested with a RUT950 router, there was no sim card in it for 3 hours, so there was no mobile internet or internet. It wasn't there by default, I tried to connect to a server that way (with TCP). Then I turned off the router and inserted the sim card. It restarted and then all of a sudden I got the following error. Would I be unlucky? Or some null pointer error? Can I have some help with that? Tested with latest master branch code.
CrashReport: A problem occurred at (system time) 21:10:53 Code was executing from address 0x1B1D0 CFSR: 82 (DACCVIOL) Data Access Violation (MMARVALID) Accessed Address: 0x1 (nullptr) Check code at 0x1B1D0 - very likely a bug! Run "addr2line -e mysketch.ino.elf 0x1B1D0" for filename & line number. Temperature inside the chip was 54.22 °C Startup CPU clock speed is 528MHz Reboot was caused by auto reboot after fault or bad interrupt detected Breadcrumb #2 was 3411279823 (0xCB53FFCF) Breadcrumb #3 was 134799361 (0x808E001) Breadcrumb #5 was 1893073725 (0x70D6033D) Breadcrumb #6 was 1655731180 (0x62B073EC)
C:\Users\JakosaCsaba\Documents\HariTech\gitrepos\robot/.pio\libdeps\normal\QNEthernet\src/QNEthernetClient.cpp:143 (discriminator 1) https://github.com/ssilverman/QNEthernet/blob/8d7614ab64a901d9ad7d4b34d3cac41f17908be0/src/QNEthernetClient.cpp#L143