u-blox / ubxlib

Portable C libraries which provide APIs to build applications with u-blox products and services. Delivered as add-on to existing microcontroller and RTOS SDKs.
Apache License 2.0
310 stars 94 forks source link

uSockGetHostByName() fails if we use stubs for wifi. #36

Closed eeFLis closed 1 year ago

eeFLis commented 2 years ago

Hello all We are using cell sockets to resolve the address of a host. However, with the current master branch this fails. The reason seems to be in uWifiSockInit() line 474 in u_sock.c. This function is also called when only cell sockets are used. In this case the call fails with error code -5.

If I comment out the line, everything works as before. Is there any other way I can work around this problem?

RobMeades commented 2 years ago

Hi there! -5 is -U_SOCK_EIO; a quick look at the code suggests that uWifiSockInit() is being called, which does a lock on uShortRangeLock() but uShortRangeInit() has not been called at this point and so the lock fails. I guess that the u_sock code should also call uShortRangeInit() if it is going to call uWifiSockInit(): @antevir?

antevir commented 2 years ago

Yes, it sounds like uShortRangeInit() has not been called. However, this should happen when you call uNetworkInit(). @eeFLis could you maybe share the init code you are using?

eeFLis commented 2 years ago

Hi The error also occurs in the socket example provided in ubxlib/example/sockets/. uShortRangeInit() is not called by uNetworkInit() because we are using wifi stubs.

antevir commented 2 years ago

Ahh.. just saw that in the title now. Unfortunately there are no proper way of excluding wifi at the moment. We are planning to address this in the next quarter: UPCOMING_CHANGES.md. So I am afraid that if you want to remove wifi by mocking you will also need to mock u_wifi_sock.c.

RobMeades commented 2 years ago

Apologies for this @eeFLis: we had thought the stub mechanism would work originally but it is really not scaleable, hence we are planning a different solution as Andreas describes. Are you able to work around the issue as Andreas suggests until we have a better solution in place?

eeFLis commented 2 years ago

@RobMeades Yes we can work around it until there is a better solution. Thank you.

eeFLis commented 2 years ago

Hi Rob is there a way in release 1.0.0 to exclude unused APIs such as wifi ble gnss? I thought it was planned for this version but I can't see any possibility.

RobMeades commented 2 years ago

Hi again, and apologies: this was something we wanted to get in for this release but re-jigging all the underlying things to introduce the device/network API, which paves the way for doing what you want, took longer than we thought. So the hooks are there, we need to get down and do the implementation now.

In fact, just a few hours ago we had a meeting about next priorities and this was highlighted. We are shooting for October; basically it comes after LARA-R6 support and I2C support, both of which are happening now.

Apologies again: what I might do, as soon as we have something, is push a preview branch of it here so that you can see if it works for you.

eeFLis commented 2 years ago

Hi Rob

That would be great. this will help us to reduce the memory footprint. Thanks If there is a preview we will test it.

RobMeades commented 2 years ago

@eeFLis: just an update that this is being worked on but didn't make it into 1.1.0. We will let you know as soon as there is something to try.

eeFLis commented 1 year ago

Hi Rob

Do you know when we will have something to try? We are slowly running out of memory.

RobMeades commented 1 year ago

Hi there: unfortunately the guy who was working on this hasn't. That said, as a side-effect of doing the CMUX work, we've ended up creating the bits of code needed to make the jump-tables that are required for this kind of "link-time" separation to work, so I can probably start looking at this myself from the start of next week.

That would suggest probably not this year but, maybe, just maybe...

Apologies again for the extreme delay on this, don't like having issues open for an entire year, though it is not the record breaker :-(.

eeFLis commented 1 year ago

Hi Rob

Since the lib is growing constantly (which is great), in most cases not all components (gnss,wifi,ble,cell) are needed. herefore it would be useful to have this feature.

already many thanks for your work

RobMeades commented 1 year ago

Understood: the growth worries me a little actually, it might be that some of the things we are adding even within, for example, cellular, are not of general interest and the code size just becomes an overhead. Anyway, will try to at least allow you to remove the things that you are definitely not interested in.

RobMeades commented 1 year ago

Had a bit of a revelation yesterday and realised that fixing this problem is a lot easier than I thought. I have pushed a preview branch of the solution here:

https://github.com/u-blox/ubxlib/tree/preview_separation_rmea

On this preview branch you should be able to change the UBXLIB_FEATURES make/CMake variable that you pass to the common ubxlib.mk and ubxlib.cmake files to, for instance, "cell" instead of "cell short_range gnss" and it should automatically stub-out the not-needed calls.

FYI, the preview branch is arranged such that it includes the preview-fix we did for your issue #75. Please let me know if it does what you want.

Also FYI, this is a preview, we've not actually reviewed this change internally yet, though I anticipate no problems. Once we merge the change to master and push it here I will delete the preview branch.

RobMeades commented 1 year ago

Actually, there's still a bug in that branch, the one you raised here originally, let me fix that and I'll update this issue when I've done it.

RobMeades commented 1 year ago

Hopefully fixed now, branch updated.

eeFLis commented 1 year ago

We use the STM32 Cube IDE, which does not have cmake integrated. but I think it should be possible to stub-out the not-needed calls in the same way.

Is this only a temporary solution? I thought you mention that the stub mechanism is not really scalable and you are planning another solution?

RobMeades commented 1 year ago

That's what I had originally thought but it is only not scalable because of having to swap in and out the stub files for the N cases; the revelation I had last night was that each common module which calls down into a ble/cell/gnss/short_range/wifi thing (which is where the cross-linkage occurs) simply has to provide its own stub versions of those calls that are weakly-linked, and that return "not supported" or whatever, then we can leave the stub files always in place and remove the real implementations as we wish, leaving the stub to take over. A nice simple rule: you call it, you stub it, very little to go wrong and easy on the brain.

I assume you're using the full Eclipse system? If you were just using a Makefile project we support that through the ubxlib.mk file but, anyway, all you should need to do is to add all of the stub files that have been introduced in the branch into your Eclipse-based build and then you can leave out all of the ble/gnss/short_range/wifi or whatever it is that you don't want and it should all link and work. I haven't yet been able to run this myself and won't be able to do so for a while so just let me know if I've missed anything.

The other approach, what I was preparing for, was to create interface types: for instance there would be one for things that MQTT needed for services from cell/wifi, but then we'd need to define/create structures of jump-tables and populate them at some point, etc,, all of which [I realised] is unnecessary overhead when weak and the GCC linker marches in to the rescue :-).

RobMeades commented 1 year ago

Actually, let me push to the preview branch again: I've just changed some of the file names during review and so if you're manually adding them it is better to get that all right. Will comment back here when pushed...

RobMeades commented 1 year ago

Right, please use this branch: https://github.com/u-blox/ubxlib/tree/preview_separation_use_this_one_rmea

I will delete the other one shortly.

eeFLis commented 1 year ago

inu_device.c you check for U_ERROR_COMMON_NOT_IMPLEMENTED but stub functions return U_ERROR_COMMON_NOT_SUPPORTED. after this change it works for us. (we use cell only).

RobMeades commented 1 year ago

Ah, great, thanks for that, we will fix it on the version we merge to master. I will leave this issue open until the final version ends up here.

eeFLis commented 1 year ago

I think its not releated to this change but if uSockCreate() is the first function called after psm, it returns error code =U_SOCK_ENOBUFS. This because uAtClientUnlock in uCellSockCreate() returns U_ERROR_COMMON_DEVICE_ERROR.

If we call uCellPwrIsAlive() bevor uSockCreate() everything works fine. But as I understand it, this should not be necessary right ?

RobMeades commented 1 year ago

Interesting: are you able to see what AT sequence causes the AT parser to get upset?

RobMeades commented 1 year ago

I mean, I guess it is that AT+USOCR is failing in some way; uCellPwrIsAlive() is going to bounce an AT off the module, just to make sure it is there, but you're right, that should make no difference at all. Just out of interest, are you using UART sleep as well as PSM?

EDIT: you're on R5 so you must be, it wouldn't go into "real" PSM otherwise.

RobMeades commented 1 year ago

It might be interesting to see if you called something like uCellInfoGetManufacturerStr() at that same point, does it fail also, i.e. is this specifically sockets related or is it just that any AT command that does not retry, if called just after return from PSM, fails in this way?

eeFLis commented 1 year ago

uCellInfoGetManufacturerStr() works at the same point but uCellInfoGetIccidStr() dont. seems like the problem is only with functions that wait for a specific response uAtClientResponseStart. In the debug print it seems that then the command is sent before the module is awake.

`AT+CCID U_CELL_INFO: unable to read ICCID.

AT AT

OK ATE0 ATE0

OK AT+CMEE=2

OK AT+UDCONF=1,0

OK ATI9

03.15,A00.01

OK AT&C1

+UUPSMR: 0

OK AT&D0

OK AT&K3

OK AT+UPSV=3

OK AT+UPSMR=1

OK AT+CPSMS?

+CPSMS: 1,,,"01000011","00001000"

OK AT+UMNOPROF?

+UMNOPROF: 90

OK AT+UPSD=0,0,0

OK AT+UPSD=0,100,1

OK AT+UPSDA=0,3

OK

+UUPSDA: 0,"IP" AT+USOCR=17,PORT

+USOCR: 0

OK U_SOCK: socket created, descriptor 2, network handle 0x2000cf88, socket handle 2. U_SOCK: connecting socket to "IP:PORT"... AT+USOCO=0,"IP",PORT

OK U_SOCK: socket with descriptor 2, network handle 0x2000cf88, socket handle 2, is connected to address "IP:PORT".`

RobMeades commented 1 year ago

Very interesting, thanks for that, there is definitely something going wrong here. Let me just get my head straight on some things:

What should happen is that, before sending the AT command, the AT client will call uCellPrivateWakeupCallback() which will call uCellPrivateIsDeepSleepActive() and, if power saving has been agreed with the network and VINT has gone low, deepSleepWakeUp() will be called, which will reconfigure the module: you can see that happening with the ATE0 etc. in your AT log.

But, somehow or other, deepSleepWakeUp() is not returning that the module is in deep sleep. Hmph.

eeFLis commented 1 year ago

yes we have VINT connected to the MCU. we use uCellPwrGetDeepSleepActive to check whether the module has entered deep sleep. We can also see from the power consumption that the module is in deep sleep. during the deep sleep the MCU wakes up to transmit some data, for this a UDP socket is opened. this is what you see in the AT sequence.

strangely enough the command order of uCellInfoGetManufacturerStr() is correct.

AT AT

OK ATE0 ATE0

OK AT+CMEE=2

OK AT+UDCONF=1,0

OK ATI9

03.15,A00.01

OK AT&C1

OK AT&D0

+UUPSMR: 0

OK AT&K3

OK AT+UPSV=3

OK AT+UPSMR=1

OK AT+CPSMS?

+CPSMS: 1,,,"01000011","00001000"

OK AT+UMNOPROF?

+UMNOPROF: 90

OK AT+UPSD=0,0,0

OK AT+UPSD=0,100,1

OK AT+UPSDA=0,3

OK

+UUPSDA: 0,"IP" AT+CGMI

u-blox

OK U_CELL_INFO: ID string, length 6 character(s), returned by AT+CGMI is "u-blox". AT+USOCR=17,5684

+USOCR: 0

OK U_SOCK: socket created, descriptor 2, network handle 0x2000cf88, socket handle 2. U_SOCK: connecting socket to "IP:PORT"... AT+USOCO=0,"IP",PORT

OK U_SOCK: socket with descriptor 2, network handle 0x2000cf88, socket handle 2, is connected to address "IP:PORT".

RobMeades commented 1 year ago

How weird! Would you be able to put some debug prints into uCellPrivateWakeupCallback() and uCellPrivateIsDeepSleepActive() to determine the route the code is following?

RobMeades commented 1 year ago

...maybe in deepSleepWakeUp() also.

eeFLis commented 1 year ago

yes i can do that. I will get back to you when i have first results.

RobMeades commented 1 year ago

One possibility, while it is in my mind, knowing that you are using STM32F4 and that power saving is very important to you: do you happen to be running FreeRTOS in tickless mode? The reason I ask is because, in our default port to STM32F4, we do not switch on tickless mode so that we can implement uPortTaskGetTickTimeMs() by incrementing a counter in the SysTick interrupt; if you are using FreeRTOS in tickless mode you would need to implement uPortTaskGetTickTimeMs() in some other way, or get FreeRTOS to correct gTickTimerRtosCount on return from MCU sleep.

The AT client will only call uCellPrivateWakeupCallback() if it believes that more than 6 seconds have passed since it was last active [this being the minimum time for any sort of sleep, UART sleep included, to take effect]; if, for some reason, uPortTaskGetTickTimeMs() were returning the wrong answer (e.g. because SysTick had stopped while the MCU was also sleeping) then it would not know that time has passed and so wouldn't know to do the waking-up bit.

eeFLis commented 1 year ago

wow that was exactly the problem. Yes we are using the tickeless idle mode. we correct the gTickTimerRtosCount now and the problem doesn't seem to occur anymore. Many thanks for your help.

RobMeades commented 1 year ago

Phew, glad that did it for you.

eeFLis commented 1 year ago

Hi we have encountered another problem related to power saving. if we configure DTR pin to controll power saving everything works fine for some time. But irregular the module keep CTS high and we stuck in uPortUartWrite(). The module can only communicate again after a reset.

have you ever observed this behavior?

we use SARA-R510S-01B-00 modem_version: 03.15 applications version: A00.01

image

RobMeades commented 1 year ago

Ah, yes, this looks like an issue that I have seen with SARA-R5: basically what happens is that if you toggle the DTR line at the wrong time, just as the module is going into sleep, it may miss the edge and not wake-up. While the module is asleep the CTS line floats high and so you cannot send anything to it, hence you will end up stuck in uPortUartWrite(); in the STM32F4 UART driver, for other reasons (it just seems to get stuck on very rare occasions), we added a 30 second timeout on UART writes (668e93b023dc6975412544b5ac01832038e52737), so it should eventually return to you and then the next command you send to the module should wake it up again, 'cos you'll be toggling DTR again and will have another chance. Are you sure that the module remains unresponsive, i.e. the only way out is reset, in your case?

There are a few ways forward:

  1. Don't use DTR (i.e. set this pin to -1 in the configuration you pass to ubxlib), instead just let the initial UART activity wake the module up, which ubxlib will do for you automatically.

  2. Carry on as you are for now, maybe reducing the guard timer in the STM32F4 UART uPortUartWrite() function (how much you can reduce it by will depend on how much data you ever send to the module in one go) and accept that commands will fail every so often.

It is possible (not yet confirmed) that there will be a maintenance release for SARA-R5 early next year which will include a fix for this problem; you will understand that going through all of the necessary approvals required for a cellular module means that such releases are rare and take quite some effort/time, hence I can't promise timescales, but there is an intention to make such a release. With that you could return to using DTR.

eeFLis commented 1 year ago

Yes I think that is the issue we are seeing. It seems that there are further issues related to DTR. This is what it written in the section " Known bugs and limitations". [u-blox ID 6980] When AT&D0 is set, a DTR transition during packet switched data mode leads to a context deactivation instead of a no action as expected.

so we will not use DTR at the moment and hope that the bugs will be fixed soon.

eeFLis commented 1 year ago

Hi We have disabled DTR (set pin to -1). Now we have the problem that the module temporarily does not switch to PSM (VINT remains high) although the URC +UUPSMR: 1 was received.

have you ever observed this behavior?

image

RobMeades commented 1 year ago

You'll need @philwareublox for this, rather than me, but I do know [we might have talked of this already] that the +UUPSMR URC and entry to deep sleep are not necessarily related. +UUPSMR means that the protocol stack has gone into a "suspended" state but the module may not enter deep sleep.

The next thing you're going to ask is "why not and when will the module enter deep sleep?". In your trace above it seems like 10ish seconds have passed and the module has not entered deep sleep by that point; @philwareublox: what kind of things might keep the module awake for that long?

philwareublox commented 1 year ago

Just to be sure, +UPSV needs to be set to something other than 0 otherwise it will go into 3GPP PSM mode, but not able to go into a [hardware] Deep Sleep.

If the module can't go into deep sleep you should also see a parameter stating what is causing the module to stay awake at that point, like the UART is still enabled (+UPSV=0) .... But you are seeing +UUPSMR: 1 with no extra parameter.

Do you have a link to the Saleae trace we could look over?

philwareublox commented 1 year ago

Just a few more points:

If +UPSV is set to 0, you should not get +UUPSMR: 1 If +USPV is set to 2 or 3, you will only get +UUPSMR: 1 as the “not going to sleep” parameter doesn’t show the RTS/DTR reason for not going to sleep.

The only way I think you are seeing +UUPSMR: 1 but VINT is not dropping is because the module is using +UPSV 2 or 3 mode still and these lines are still held for keeping the module awake.

What +UPSV mode are you using?

eeFLis commented 1 year ago

We use +UPSV mode 1. We see that AT+UPSV=1,1300 command is sent during the wake-up procedure.

Is it possible that I send you the Saleae trace in a PM?

philwareublox commented 1 year ago

Please send it to phil.ware, using same format as Rob's email. Thanks.

eeFLis commented 1 year ago

ok thanks you should have just received an email.

RobMeades commented 1 year ago

The changes required to allow one or more of GNSS/Wifi/Ble/Cell to be left out of a build, the original question of this issue, are now pushed to master here, see commit bf8cd21ff293ba2e3cf2f750f682aff1cf335fc0. I will close this issue now and will delete the preview branch in a few weeks.