sparkfun / SparkFun_LTE_Shield_Arduino_Library

Arduino Library for the SparkFun LTE Cat M1/NB-IoT Shield.
Other
27 stars 14 forks source link

Multiple sockets seem to cause crashing. #7

Open MatthewMenze opened 4 years ago

MatthewMenze commented 4 years ago

The socket close handler reports the incorrect socket closing, and seems to be related so some sort of device crash.

I am using the LTE shield with a Metro M4 Airlift Express board, using the hardware serial interface.

Based on experimentation, I believe this is an issue of incorrectly tracking and handling socket events, causing resources to be incorrectly de-allocated as sockets are created and destroyed. Maybe this is related to the memory issues I see mentioned in other issues and commits?

MatthewMenze commented 4 years ago

After additional digging into the library source code, the issue appears to be due to how incoming URC's are processed in this library.

In the poll() function incoming serial data is parsed until the first newline character, then from there it is checked for various URCs, and processed accordingly.

When processing a socket read, the library then sends a command to read the indicated socket data, then waits for a response from the LTE module with the socket data. While waiting for that, if additional packets come in, and cause the LTE module to transmit another +UUSOLI URC message, it will get eaten by the loop in socketRead that is waiting for a response from the USORD command sent.

This will cause data to exist on the LTE socket without the library knowing it is there, effectively rendering that socket unusable. Eventually if this happens enough, all available sockets are filled with data, but the library doesn't know to request it, and the whole thing locks up.

I think there is a few pieces required to fix this.

  1. Polling needs to be buffered. Ideally you would have an interrupt service routine handle any time there was data available on the line, and then push that into a buffer for the poll() function to iterate through and handle any commands stored within.
  2. There needs to be logic that handles unexpected data in the socketRead() function, and does something reasonable. Maybe flush all sockets data may be sitting on? (If it's caught here, it is probably fragmented/to late)
  3. Waiting for a response for up to 1 second in the socketRead() is probably going to cause headaches, especially if you are not using an interrupt or similar to grab and buffer data...A lot of new events could come down the pipe in a second.

I'm working implementing some sort of work around for this, but I'm not super familiar with the library, and may need to ditch this LTE Module entirely for the sake of my clients project.

MatthewMenze commented 4 years ago

I've modified the read and write code paths in the library quite a bit now, and resolved much of the issues. Once I am sure it is working right I will upload a updated copy of the library files for you to review. My changes are not ready to integrate into the project out of the box, as I am not familiar enough with the library to tell if I have broken other parts of it, and I have not fixed the GPS functionality as I am not using it.

The major changes I had to implement are:

nseidle commented 4 years ago

Thank you for all your work on this! Please keep it coming, I really appreciate it.

If you're able to get to a finishing point, a PR would be ideal. If you have any examples you'd like to add to the library we'd gladly accept them as well.

ROSW6341 commented 4 years ago

This took a lot of time to resolve so I'm passing it along here. I have several R410 modules and found it impossible to connect to a TCP server. They would open the port but refused to connect. Running the ATI19 command in pass-thru mode returned L0.0.00.00.05.06.02.00. After a lot of searching I found the current version of the firmware is L0_0_00_00_05_08 which can be updated via the USB connector on the SparkFun LTE Shield. This version is suppose to correct the problem.

MatthewMenze commented 4 years ago

I'm getting closer now, but still running into new issues I have found, I'll look at checking this repo out and making a pull request at some point...So far I have just been writing against the files directly:

My fixes will probably not work for the general case, for one I am not handling buffers being filled super gracefully, I just check to make sure I am not writing out of bounds. If a user is only sending data, they will eventually fill. (If they are not calling my poll function regularly to handle events)

Realistically, they probably should always call some sort of polling function, as that is how you would detect and process errors.

You may want to consider the structure of the sendCommandWithResponse() function as well. At a minimum I think it may need a "expected error" as a argument, but more realistically, I think it might be worth restructuring the function to just read until new data isn't coming on the pipe (I use a ~1ms delay from the last received character at the default baud rate, and will try to reduce that more), then rely on the calling function to parse and act accordingly. Maybe not though, that is just my thoughts coming in. It would probably make handling error responses cleaner, as you wouldn't need to guess the length coming in.

I am using a board with a fair bit of memory on hand, so I have just been reallocating the char*s to 128 characters, I also run both the RX and backlog buffers as 2056 characters. This might not be reasonable on other devices.

You may also consider setting up read and write interrupts that buffer all this, and then parsing those with a regularly called device service routine, it seems like that is how this device wants to work, but I could be wrong.

MatthewMenze commented 4 years ago

@ROSW6341 thank you for the pointer! I contacted support at uBlox and they provided me the easyflash tool and updated firmware file. What did you need to do to connect via USB to the shield? So far I have just soldered the jumper required to use the USB interface, did you need to do anything else?

I notice my computer only recognizes it is connected if I hold the power button down on the shield.

MatthewMenze commented 4 years ago

@nseidle I've created a pull request of my working-for-me batch of fixes. It will need some more work to integrate into something usable for everyone using this product. Please feel free to reach out to me with any questions.

MatthewMenze commented 4 years ago

I've updated with a second commit code to listen and receive UDP packets. It's a bit rough, particularly in the parsing of the read data (I plan to clean that up).

The library as it was did not have any code for parsing the URCs related to reading UDP packets as specified in the ublox documentation. Although you still use USOLI to set up a listening socket, the URCs that come in from a received UDP packet are different. Additionally, while you can read UDP data with USORD, the uBox spec says it is preferred to us USORF for UDP data, so I implemented that as well.

MatthewMenze commented 3 years ago

@nseidle just wanted to see if there has been any updates on this issue on your end? I seem to have it working decent now, but I have really tore up the codebase, even beyond the pull request I made a while back, and I'm not sure if it impacts the usability for things outside my specific use case.

nseidle commented 3 years ago

Hi Matthew - Sorry for the radio silence. We had to put this product to the side for a bit. We have tentative plans for a SARA 5 based product. I plan to either fork or otherwise modify this repo to be more generic (just a ublox SARA library rather than a SparkFun LTE shield product library). If we can convince you to bear with us, we'd love to work with you.

MatthewMenze commented 3 years ago

No worries. Feel free to reach out whenever. I've changed around a few more things since this pull request was made, but haven't had a chance to really polish anything. I am using this circuit in a proof of concept prototype, but the project is nearing the point where we will develop custom hardware and firmware so there hasn't been much oppurtunity to refine stuff beyond the proof of concept point.

I'd love to help out refining this product though, the SARA line is great, and I think there is a lot of room to make this product preform better. Especially with some of the integration options that seem to be on the horizon between the SARA units and uBlox ZED chips.

Is there a way to reach out to the products team at SparkFun? There is actually a couple projects at my company floating around we wanted to see about working with Sparkfun on. Not sure the best point of contact for that.

nseidle commented 3 years ago

Neat! Start with me nathan @ and I'll get you connected with the right people.

MatthewMenze commented 3 years ago

Awesome! I sent an email to the address listed on your github. Thank you!

MatthewMenze commented 3 years ago

@nseidle has there been any updates getting the buffered polling working in this library? I'm still using my hackish fork of it, somewhat updated from this issue, but I don't trust it nearly as much as I would like.

I notice some buffered polling is implemented in your SARA R5 library, I am considering doing some hardware prototyping with the R5 boards you offer, but I'm just not sure what the state of it is. I'm between sticking with my hackish version or migrating.

Thanks!

nseidle commented 3 years ago

We are dedicating a lot of resources towards the SARA R5 library these days. It is our focus from now on but we will support major issues in the R4 library as well. We haven't had the chance to address this issue yet. If you'd like hardware to test out the R5 I would be happy to provide it to you.

MatthewMenze commented 3 years ago

Thank you for the update! I would actually be interesting in testing out the R5, it is our target chipset for production. I went with the R4 shield as it fit into a proof of concept prototype I was working on well physically.

I was actually looking at testing with one of these from your product catalog: https://www.sparkfun.com/products/17272 for a project a bit further down my pipeline.