the-modem-distro / pinephone_modem_sdk

Pinephone Modem SDK: Tools to build your own bootloader, kernel and rootfs
GNU General Public License v3.0
595 stars 64 forks source link

Sometimes modem disappears partially [mobian] #9

Closed kkeijzer closed 3 years ago

kkeijzer commented 3 years ago

Sometimes the modem disappears partially. gnome-control-center will say "No wireless / QMI device found", ip a will still show a wwan0 device, but with no IP addresses or routing tables, and gnome-calls will state that there is no voice-capable modem detected.

This can happen after resuming from deep sleep, but sometimes also while the phone is awake. (I sometimes wake up finding the modem "broke" overnight, while the phone was charging and not going to deep sleep.)

The modem is still detected with lsusb in this state; no different from normal operation:

Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 008: ID 2c7c:0125 Quectel Wireless Solutions Co., Ltd. EC25 LTE modem
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

The only way to get the modem to work again, is by running adb shell reboot from the PinePhone. (adb reboot does not work.)

Attached are openqti.log and dmesg of both the modem and the PinePhone.

I can't see anything out of the ordinary. Do you have any suggestions of where I should look the next time this happens?

DylanVanAssche commented 3 years ago

@PsychoGame enabling ADB may reduce the ability to suspend the USB core (which increases the power consumption). This may be the reason why it seems more stable.

Biktorgj commented 3 years ago

Next time this happens (there's a lot of conditions the modem cannot handle, like something else removing it from the host like eg25-manager did), try to restart only ModemManager (systemctl restart ModemManager). At least that way we can know if OpenQTI can handle it (it should but as it is obvious, what works for me might not be what works for you :))

DylanVanAssche commented 3 years ago

Maybe the eg25-manager shouldn't do this reset anymore? If it is stable enough, it cannot be lost anymore...

kkeijzer commented 3 years ago

It just broke again, while plugged in and doing nothing. I tried restarting ModemManager, but that did not help.

dmesg.txt openqti.log modemmanager.log networkmanager.log

DylanVanAssche commented 3 years ago

@kkeijzer Do you have the eg25-manager log as well?

kkeijzer commented 3 years ago

Sure.

eg25-manager.log

DylanVanAssche commented 3 years ago

Hmmmm it is again the eg25-manager who kicks out the modem, maybe that reset logic of the eg25-manager must be revisited with the masking @Biktorgj did regarding disconnect events? Would be interesting if we can run the eg25-manager but with the reset logic disabled.

Biktorgj commented 3 years ago

Fastest test would be to boot the phone, let it ask for the PIN, then disable eg25manager service and kill it with -9 so it doesnt shut down the phone. Let the modem live only attached to modemmanager and see how it behaves

kkeijzer commented 3 years ago

The daemon just respawns when I try that.

QC8086 commented 3 years ago

Hello. I got curious and decided to try to help test this. I'm running biktor's 0.28 firmware with the updated ADSP on Arch Linux ARM Phosh with ModemManager 1.16.8.

I tried what biktor suggested, simply stopping the eg25-manager service and killing it once the system boots. However, as soon as eg25manager is dead the modem seems to stop communicating in a useful way with the system. It doesn't vanish, but stops being useful for data, texting, calls, etc. I don't have logs for this event, but I can reproduce it if they may be useful.

After that I tried getting the eg25manager source and commenting out the two modem_reset calls (I assume this is what I should be commenting out, I have no knowledge of the eg25manager code). One in udev.c and the other in suspend.c. Unfortunately not long after I rebooted with the newly compiled eg25manager, the modem simply vanished again. Logs for my test are attached.

2021-08-02-21:21-dmesg.log 2021-08-02-21:21-eg25-manager.log 2021-08-02-21:21-modemmanager.log 2021-08-02-21:21-networkmanager.log 2021-08-02-21:21-openqti.log

Let me know if there's anything else I can do.

Biktorgj commented 3 years ago

When you stop eg25manager it shuts down the modem, the next best thing that I can think of is, let it start, and once booted move the eg25manager binary from /usr/bin (talking from memory, maybe it's in another path), and then kill -9 the still running process. When it attempts to restart the service it won't be there but won't kill the modem

QC8086 commented 3 years ago

Got it. Instead I just removed the Restart=on-failure line from the eg25-manager serivce file and killed it. Behavior doesn't seem much different than normal. Lasted an hour or two and modem went away again. I disabled suspend so that's not a factor.

dmesg.log eg25-manager.log modemmanager.log networkmanager.log

Biktorgj commented 3 years ago

@ThePosichronic Question: have you flashed the newer ADSP firmware? Or is that with the default one from my recovery repo?

QC8086 commented 3 years ago

I flashed the one listed under the 0.28 release ("Latest ADSP firmware").

Although interestingly, I have an update. My modem died twice after my last comment, but the last time I didn't reboot my phone, I just ran adb reboot on the modem. And since then (35 hours ago), my modem has been perfectly stable which is definitely a record for me, stock firmware or otherwise.

When I rebooted my modem, it showed up in ModemManager as modem 2 and I didn't need to run eg25manager for it to start working. I haven't rebooted to try anything since I want to see how long it can go. Also I haven't let my PinePhone suspend either so I don't know how well that works. I thought it would be better for testing this specific issue.

dmesg.log eg25-manager.log modemmanager.log networkmanager.log

As you can see from my logs, eg25manager has been dead since August 3 16:52, but ModemManager hasn't had an issue since I rebooted the modem at 21:22 the same day. I think you guys have something. :)

Biktorgj commented 3 years ago

Hi again, all the latest patches make for a wonderfully working phone in postmarketOS (one charge cycle in between): ->adb shell uptime 07:49:33 up 2 days, 7:49, load average: 1.07, 2.02, 2.67

Anyone tried the latest release I posted with Mobian? Any thoughts?

DylanVanAssche commented 3 years ago

Hi again, all the latest patches make for a wonderfully working phone in postmarketOS (one charge cycle in between):

That's great news!

Anyone tried the latest release I posted with Mobian? Any thoughts?

I'm currently testing stock for the next MM release on postmarketOS, will try the latest on postmarketOS as well in the near future :)

Thanks a lot everyone for working on this!

kkeijzer commented 3 years ago

I'm running 0.2.9 and so far it's behaved well. The problems are probably related to eg25-manager and ModemManager in Mobian anyway. The only real problem I'm still having is IPv4/IPv6 dual stack being broken without re-registration (I also tested this on postmarketOS), but that's probably not related to the modem firmware but ModemManager.

If somebody could find out why that happens and how to fix it, everything would be working perfectly for me as well.

Alexmitter commented 3 years ago

@Biktorgj as asked, here the dmesg of the phone https://gist.github.com/Alexmitter/0b107d9fd74484308fb7457ffc8dac17 Sadly adb does not find the modem in the broken state, so no dmesg from it. Non the less, I did forcefully restart my phone to get the modem back and will fetch the logs then, maybe there is something of interest.

Logs from modem after bringing it back manually: dmesg: https://gist.github.com/Alexmitter/6ad2e7c07a2ec7989956a9f136d3a249 openqti.log: https://gist.github.com/Alexmitter/84ef6aed55cd1a7957632a883d26afa7

Biktorgj commented 3 years ago

Sadly, the logs don't persist on reboot by default to avoid writing to the NAND, so not much of interest there... But! (nothing like stating that it's working to find it panicking again)

[-180429773] W track_client_count: Assigned instance ID 0x03 to service 0x01 
[-173070363] E rmnet_proxy: Closed descriptor at the ADSP side 
[-173070360] E gps_proxy: Closing descriptor at the USB side 
[-173070360] E gps_proxy: One of the descriptors was closed, restarting the thread 
[-173070302] E gps_proxy: Initialize GPS proxy thread.
[-173070243] E Error opening /dev/smd7 
[-173070241] E One of the descriptors isn't ready
[-173070069] E rmnet_proxy: Closed descriptor at the ADSP side 
[-173069241] E gps_proxy: One of the descriptors was closed, restarting the thread 
[-173069240] E gps_proxy: Initialize GPS proxy thread.
[-173062138] W track_client_count: Request for service 0x1a with any instance 
[-173062136] W track_client_count: Assigned instance ID 0x01 to service 0x1a 
[-173062075] W track_client_count: Request for service 0x02 with any instance 
[-173062069] W track_client_count: QMI Client Release from HOST,S:1a I:01, AC:11 
[-173062068] W track_client_count: New client request when timed out, resetting... 
[-173062067] W force_close_qmi: Wiping any registered client to any instance
[-173058342] W track_client_count: Assigned instance ID 0x02 to service 0x02 

So what happened with this? Still unsure, but let me elaborate a bit. When this happened and ModemManager saw the modem vanished I did four things:

  1. Restarted OpenQTI: No effect.
  2. Killed (-9) eg25manager: No effect
  3. Restarted ModemManager: No effect, when trying to reconnect it would just get stuck there, just like @kkeijzer's:
    [20208]: <info>  ModemManager (version 1.16.6) starting in system bus...
    [20208]: [/dev/cdc-wdm0] Opening device with flags 'version-info, proxy'...
    [20208]: [/dev/cdc-wdm0] created endpoint
    [20208]: transaction 0x1 aborted, but message is not abortable
    [20208]: <info>  [device /sys/devices/platform/soc/1c1b000.usb/usb2/2-1] creating modem with plugin 'quectel' and '6' ports
    [20208]: <warn>  [plugin/quectel] could not grab port cdc-wdm0: Cannot add port 'usbmisc/cdc-wdm0', unhandled port type
    [20208]: <warn>  [plugin/quectel] could not grab port ttyUSB3: Cannot add port 'tty/ttyUSB3', unhandled port type
    [20208]: <warn>  [plugin/quectel] could not grab port ttyUSB2: Cannot add port 'tty/ttyUSB2', unhandled port type
    [20208]: <warn>  [base-manager] couldn't create modem for device '/sys/devices/platform/soc/1c1b000.usb/usb2/2-1': Failed to find primary AT port
    ^C[20208]: <info>  caught signal, shutting down...
    [20208]: <info>  ModemManager is shut down

Before just giving up and rebooting, just issued an AT+RESETUSB to force the Pinephone and the modem to unbind and rebind And just like that, ModemManager renegotiated and recovered.

[20273]: <debug> [1628531081.129270] [modem1] messaging capabilities supported
[20273]: <debug> [1628531081.129652] [modem1] supported storages loaded:
[20273]: <debug> [1628531081.129843] [modem1]   mem1 (list/read/delete) storages: 'sm, me'
[20273]: <debug> [1628531081.129918] [modem1]   mem2 (write/send) storages:       'sm, me'
[20273]: <debug> [1628531081.129991] [modem1]   mem3 (reception) storages:        'sm, me'
[20273]: <debug> [1628531081.130329] [modem1] couldn't initialize current storages: No AT port available to run command
[20273]: <debug> [1628531081.131255] [modem1] couldn't initialize interface: 'Time not supported'
[20273]: <debug> [1628531081.131866] [modem1] extended signal capabilities supported
[20273]: <debug> [1628531081.132703] [modem1] OMA capabilities not supported
[20273]: <debug> [1628531081.133141] [modem1] couldn't initialize interface: 'OMA not supported'
[20273]: <debug> [1628531081.134351] [modem1] voice support check failed: No AT port available to run command
[20273]: <debug> [1628531081.134507] [modem1] couldn't initialize interface: 'Voice not supported'
[20273]: <info>  [1628531081.136285] [modem1] state changed (unknown -> disabled)
[20273]: <debug> [1628531081.136636] [modem1] modem initialized
[20273]: <debug> [1628531081.142034] [device /sys/devices/platform/soc/1c1b000.usb/usb2/2-1]  exported modem at path '/org/freedesktop/ModemManager1/Modem/1'
[20273]: <debug> [1628531081.145144] [device /sys/devices/platform/soc/1c1b000.usb/usb2/2-1]     plugin:  quectel
[20273]: <debug> [1628531081.145283] [device /sys/devices/platform/soc/1c1b000.usb/usb2/2-1]     vid:pid: 0x2C7C:0x0125
[20273]: <debug> [1628531081.309864] [modem1] user request to connect modem
[20273]: <info>  [1628531081.322753] [modem1] state changed (disabled -> enabling)
[20273]: <debug> [1628531081.323752] [modem1] skipping initialization: device hotplugged
[20273]: <debug> [1628531081.324015] [modem1] couldn't start parent enabling: Couldn't open ports during modem enabling: Couldn't get primary port
[20273]: <debug> [1628531081.324109] [modem1] need to check whether autoconnect is disabled or not...

So, there's sometimes after suspend and resume where the ports just "get stuck". It may not happen in 3 days (my top runtime goes to 3 days and 6 hours), or as soon as you get a bad suspend. Where does it come from I don't know, but what I do know is when that happens OpenQTI gets an EOF in one of the device, and I think that leaves one of the devices in the kernel "stalled". That's why the cdc device doesn't get a response, smdcntl8 has closed and won't respond to anything (I THINK).

How to handle this then? If I get some time tomorrow morning, I'm going to try to handle that closing with resetting the entire rmnet proxy thread. If the device is closed this should allow to recover by closing the active descriptor and opening it back. Since ModemManager will be not really receptive to it suddenly become available again, I will issue a USB reset from OpenQTI, that will unbind and rebind from the host making ModemManager think there's a new device, and allow it to reattach.

If some of you face this too, please try just one thing: If ADB is still available, run adb shell /etc/init.d/init_openqti restart, then after a few seconds, send the AT command "AT+RESETUSB" to /dev/ttyS2, and let me know if that works

By the way, please flash this up so we can get some more usable log timestamps :)

030_testing.tar.gz

Alexmitter commented 3 years ago

@Biktorgj The modem disappeared again and I was able to get logs via ADB. Phone dmesg: https://gist.github.com/Alexmitter/ac03904f6d87b217288ba4a64cb225c1 Modem dmesg: https://gist.github.com/Alexmitter/2436ca017695e657319b223224b365a5 Openqti log: https://gist.github.com/Alexmitter/8e480a87ebbf2f0a971b83da40ac4fc6

The modem seems to have somewhat came back after the adb communication because instead of no symbol, the "no sim" symbol appeared.

adb shell /etc/init.d/init_openqti restart and sending AT+RESETUSB did bring the modem back fully.

kkeijzer commented 3 years ago

I also had the "no SIM" symbol yesterday, but out of nowhere. Here are my logs.

dmesg.txt eg25-manager.log modemmanager.log networkmanager.log openqti.log

Alexmitter commented 3 years ago

Quick update because I felt this is maybe interesting. Yesterday I wrote a quick script doing first adb shell /etc/init.d/init_openqti restart and then sending AT+RESETUSB just so that I can do it conveniently from the phone.

What I have noticed is that after a proper phone reboot, it won't take long for the modem to disappear, but executing my little script brings it back and after that its very stable, without any issue, the whole day now. Rebooting on purpose causes exactly the same behavior again, so its repeatable.

Biktorgj commented 3 years ago

Okay, I have one unresolved issue that still need more investigation, but otherwise, I can't manage to kill it either in pmOS phosh / mobian, with or without suspend hooks disabled. The "No SIM" error was my fault because I was looking at a byte at the wrong offset... sorry about that

030_test6.tar.gz

If you see something like this: https://paste.ubuntu.com/p/j59KcTdZzc/ (USB cable is bad) then you'll need to reboot the phone no matter what. It doesn't matter if the modem is rebooted, shut down via kill switch or powered back up, something, sometimes, breaks in the Pinephone kernel and I haven't found what makes it break. It seems it only happens when there's a lot of suspend-resume cycles non stop, but just in case

Let me know if this works for you!

kkeijzer commented 3 years ago

I had indeed seen that bad cable error every now and then, and as you said, rebooting the A64 processor was the only way to get it working again.

I'm testing 030_test6 now. I'll let you know if anything breaks.

Alexmitter commented 3 years ago

I tried to flash the two images from 030_test6, sending "boot" was working fine, but it now hangs on writing "boot". Its stuck since.

Edit: was able to recover it. Flashed it again with 030_test6. Now flashing did work.

Biktorgj commented 3 years ago

So, do you think one day I'll be able to close this issue? Just published 0.3.0 with some more fixes to what you already have: https://github.com/Biktorgj/pinephone_modem_sdk/releases/tag/0.3.0

I can't manage to kill it anymore in any way (there was one edge case with the previous build where it would try to write to a closed file descriptor when ADSP wasn't ready triggering a buffer overflow)

Tested in pmOS:

Tested in Mobian:

Sometimes it might take it a bit more to recover, sometimes a bit less, but it always ends up coming back again. Please let me know how this works for you

kkeijzer commented 3 years ago

I had another problem yesterday. My network name is "NL KPN", and my provider name is "Youfone". Normally, Phosh shows "NL KPN" under the top left icon. But yesterday that changed to "NL KPN Youfone", and the modem wasn't working any more. Like something was being parsed incorrectly somewhere. Rebooting the modem made it work again, and the icon in Phosh showed "NL KPN" again, as normally is the case.

I'll attach the logs:

dmesg.txt eg25-manager.log modemmanager.log networkmanager.log openqti.log

And I'll test 0.3.0 from now on.

Biktorgj commented 3 years ago

I had another problem yesterday. My network name is "NL KPN", and my provider name is "Youfone". Normally, Phosh shows "NL KPN" under the top left icon. But yesterday that changed to "NL KPN Youfone", and the modem wasn't working any more. Like something was being parsed incorrectly somewhere. Rebooting the modem made it work again, and the icon in Phosh showed "NL KPN" again, as normally is the case.

I'll attach the logs:

dmesg.txt eg25-manager.log modemmanager.log networkmanager.log openqti.log

And I'll test 0.3.0 from now on.

I think you hit just the bug where openqti tried to write to a closed descriptor and overflowed resetting itself without cycling the USB port. That shouldn't happen again :)

kkeijzer commented 3 years ago

By the way, do you recommend me to test with --test-no-suspend-resume or without?

Biktorgj commented 3 years ago

It should recover from errors in both scenarios.

No suspend-resume allows for a faster wakeup and is default in pmOS, so from a usability point of view I would leave it enabled (I'm testing it without it right now as it passed the tests for me with it already)

DylanVanAssche commented 3 years ago

--quick-suspend-resume is the default on pmOS edge. Mobian has this as well with modemmanager-git, if on stable only --no-suspend-resume is possible.

Biktorgj commented 3 years ago

--quick-suspend-resume is the default on pmOS edge. Mobian has this as well with modemmanager-git, if on stable only --no-suspend-resume is possible.

I wonder if no suspend resume would work normally in mobian on stock or if it eventually crashes, anyone tried that? If it works correctly it would be a sensible default to use... After all, instantly waking from suspend is better than having to wait 15 seconds until ready

DylanVanAssche commented 3 years ago

Without quick resume, ModemManager may be out of sync with the modem itself which may crash the modem ;)

On Friday, 13 August 2021, Biktor wrote:

--quick-suspend-resume is the default on pmOS edge. Mobian has this as well with modemmanager-git, if on stable only --no-suspend-resume is possible.

I wonder if no suspend resume would work normally in mobian on stock or if it eventually crashes, anyone tried that? If it works correctly it would be a sensible default to use... After all, instantly waking from suspend is better than having to wait 15 seconds until ready

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/Biktorgj/pinephone_modem_sdk/issues/9#issuecomment-89872069

-- Sent from my Sailfish device

Biktorgj commented 3 years ago

To follow up with the tests, I have updated Mobian to unstable, and currently checking out how things go with ModemManager 1.18rc1 (1.17.900). So far modem responds normally, data works, and hasn't crashed in the last half hour so looking promising. (with your patches enabled @DylanVanAssche )

kkeijzer commented 3 years ago

So far it's still going well for me.

kevin@pinephone:~$  adb shell uptime
 21:39:11 up 2 days, 23:33,  load average: 1.00, 1.01, 1.04

Admittedly, the phone has been almost constantly awake, so I have not tested suspend/resume very well yet.

By the way, would there be a way to sync the modem's clock with the host some way? Now the modem always lives in 1970.

Biktorgj commented 3 years ago

My yesterday's trials went good enough (tm), it survived network dropouts, constant moving (traveling by car for 600+km) and random wakeups and suspends without any issues until phoc crashed :( So far going nicely. If curious about the suspend events, a simple adb shell dmesg |grep SUSPEND (or DISCONNECT for mishaps when suspending) will give a rough idea if something happened that would have broken the modem in previous versions

kkeijzer commented 3 years ago

I've been testing it ever since, and so far the modem has not once broken when the phone was awake. The only thing that occasionally happens is that when the phone resumes from suspend, the modem's USB interface is gone. I'm still trying to figure out what would be the best way to deal with that.

But I guess we can say that the modem firmware itself is really stable now.

Thank you for all your work on this.

Biktorgj commented 3 years ago

...I'm still trying to figure out what would be the best way to deal with that.

I think the best way to deal with it is just... time, to polish things everywhere (Modem's kernel, Pinephone kernel, oFono, ModemManager, openQTI itself etc). USB reset is done in purpose when things get out of sync. I've gone 3 days with postmarketOS for example and openQTI has only done it once, and was for a fault in openqti triggered by the modem kernel, but maybe tomorrow it does it 5 times during the day ;)

I hope, as software matures in all fronts, all these auto recovery functions will be triggered less and less often.

Thank you for all your work on this.

My only goal with this is for everyone to have a functional modem that doesn't heat up like an oven and that you can trust at least a bit more, thank you for using it and helping me fix it!

kkeijzer commented 3 years ago

I'm afraid I have been having some issues again. The last two nights the modem broke while I was asleep. The USB device was gone, and the only remedy to get it working again was rebooting the phone.

The AT interpreter still appeared to be working but running AT+RESETUSB or AT+QFASTBOOT did not seem to help.

So I couldn't get the logs from the modem unfortunately.

eg25-manager.log modemmanager.log networkmanager.log

Alexmitter commented 3 years ago

Hello

I flashed the new firmware(3.3.2) when it was released and until yesterday I had no issue. Yesterday it started that the modem would disappear rather quickly after boot and without the usual USB errors in dmesg(at least for one time where I can be sure).

Yesterday I had to reboot 4 times over the evening while it was working fine before. Today it seems to be working stable again, though I don't know for how long.

The same as @kkeijzer explained, and the usual AT commands also did not help.

Biktorgj commented 3 years ago

Aaaand here we go again :)

Are you running latest release, the one with USB audio support? ( @Alexmitter yes, @kkeijzer ?) When it happens again, can you post pinephone's dmesg?

Edit to add: The question really is, is this from the firmware, or is it from the PinePhone kernel? If "AT+QFASTBOOT" or "AT+RESETUSB" still say OK (they don't hang in there with no reply) then the firmware is still working, but probably Pinephone's usb got stuck. AT+QFASTBOOT forces the modem into fastboot, so if after issuing that command still doesn't show up at least as a fastboot device it's probably the Pinephone side that messed up and you should see a message in the Pinephone's dmesg similar to "Cannot enumerate USB device, cable is bad?"

Another way to be really sure where is this coming from is this:

  1. Disable the modem via kill switch, and wait a few seconds
  2. stop eg25-manager (systemctl stop eg25-manager)
  3. Enable the modem again from the kill switch
  4. While monitoring dmesg output on another terminal, restart eg25-manager. This will issue the power gpio to turn the modem back on. After doing that, if you see the "usb cable is bad" message again, this comes from the pinephone itself and there's no remedy on fhe modem side, phone needs to be rebooted, if fastboot appears again and the modem comes back to life then it's my fault somewhere :) If you both are using Mobian I can build a test-kernel with Bhushan's patches (https://invent.kde.org/teams/plasma-mobile/issues/-/issues/3) so you can check if it fixes it. Since this comes and goes (hasn't happened to me even once in like a month) it's really hard to reproduce

Was the phone doing something? Just suspended? Actively used?

Alexmitter commented 3 years ago

@Biktorgj The weird thing yesterday was that after the modem disappeared on me, I checked dmesg and did not find the usual "Cannot enumerate USB device, cable is bad?", but the modem was still gone in my perspective. It seems to not have answered on the serial port and neither adb nor fastboot could find it.

I was just regularly using it, checking telegram every 15 minutes or so while I was on a walk so there were a few suspends.

kkeijzer commented 3 years ago

Yeah, I'm running 0.3.1. I don't think this was happening with 0.3.0. As far as I recall, the AT interpreter did not respond with OK, and I also did not see the USB faulty cable message in dmesg.

The phone was not doing anything interesting. I was asleep, the phone was charging (so not suspending), and only Geary and Telegram were opened. Looking at my eg25-manager log, the modem was "lost" quite often that night.

Biktorgj commented 3 years ago

Then I don't know where I might have introduced the problem... Here's an up to date image with all the latest changes: https://github.com/Biktorgj/pinephone_modem_sdk/releases/tag/0.3.3 It's what I'm currently running, maybe I had something broken in 0.3.1?

QC8086 commented 3 years ago

I can confirm my modem has died a few times in the past 2-3 weeks. Arch, Linux 5.13.12, ModemManager 1.16.8, firmware 0.3.0. It happened when I went into areas with little/no signal twice, and twice overnight while I was asleep. (There's not great signal where I put my PinePhone at night).

Overall still way better than before, but it's definitely happening on 0.3.0. I'm still running with eg25-manager killed shortly after boot if that matters. I will be sure to get logs for you next time it happens.

I was going to say I didn't have any logs, but ask and you shall receive! My modem died while I was writing this post, or thereabouts. It also shouldn't have been suspended during that time as it's been on a charger for a few hours. Hope this helps.

dmesg.log eg25-manager.log modemmanager.log networkmanager.log

I suppose I'll flash 0.3.3 tomorrow and see what happens. Also I'll try the killswitch thing if it happens again.

kop316 commented 3 years ago

I am also having issues too with the modem disappearing, sometimes it does not come back.

I am on Mobian Bookworm with Modem manager 18.1~rc1.

I'm on 3.3.0 of the modem firmware. From the looks of it, I should update to 3.3.3 and see if I still have the se issues?

Alexmitter commented 3 years ago

@kop316 Hey. Do you see the usual "Cannot enumerate USB device, cable is bad?" in dmesg?

Usual in terms of that it is a known issue on the phone kernel.

kop316 commented 3 years ago

@kop316 Hey. Do you see the usual "Cannot enumerate USB device, cable is bad?" in dmesg?

Usual in terms of that it is a known issue on the phone kernel.

@alexmitter I am coming up to speed on everything to check, so I didn't know about that. The next time it happens I will see if that's the issue.

kop316 commented 3 years ago

@Biktorgj @Alexmitter

The firmware just crashed on modem firmware 3.3.3:

[ 3756.220927] usb 2-1: USB disconnect, device number 3
[ 3756.235524] option1 ttyUSB0: GSM modem (1-port) converter now disconnected from ttyUSB0
[ 3756.251643] option 2-1:1.0: device disconnected
[ 3756.257251] option1 ttyUSB1: GSM modem (1-port) converter now disconnected from ttyUSB1
[ 3756.272416] option 2-1:1.1: device disconnected
[ 3756.282193] option1 ttyUSB2: GSM modem (1-port) converter now disconnected from ttyUSB2
[ 3756.294354] option 2-1:1.2: device disconnected
[ 3756.301093] option1 ttyUSB3: GSM modem (1-port) converter now disconnected from ttyUSB3
[ 3756.313564] option 2-1:1.3: device disconnected
[ 3756.318657] qmi_wwan 2-1:1.4 wwan0: unregister 'qmi_wwan' usb-1c1b000.usb-1, WWAN/QMI device
[ 3757.000562] usb 4-1: new full-speed USB device number 2 using ohci-platform
[ 3757.200542] usb 4-1: device descriptor read/64, error -62
[ 3757.504552] usb 4-1: device descriptor read/64, error -62
[ 3757.804528] usb 4-1: new full-speed USB device number 3 using ohci-platform
[ 3758.004528] usb 4-1: device descriptor read/64, error -62
[ 3758.308526] usb 4-1: device descriptor read/64, error -62
[ 3758.420772] usb usb4-port1: attempt power cycle
[ 3758.640503] usb 4-1: new full-speed USB device number 4 using ohci-platform
[ 3759.072466] usb 4-1: device not accepting address 4, error -62
[ 3759.268495] usb 4-1: new full-speed USB device number 5 using ohci-platform
[ 3759.696582] usb 4-1: device not accepting address 5, error -62
[ 3759.702615] usb usb4-port1: unable to enumerate USB device

After following the procedure described here: https://github.com/Biktorgj/pinephone_modem_sdk/issues/9#issuecomment-905223378

The modem came back and I got the following in dmesg:

[ 4231.435740] usb 2-1: new high-speed USB device number 5 using ehci-platform
[ 4231.606839] usb 2-1: New USB device found, idVendor=18d1, idProduct=d00d, bcdDevice= 1.00
[ 4231.615119] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 4231.622362] usb 2-1: Product: Android
[ 4231.626127] usb 2-1: Manufacturer: Google
[ 4231.630208] usb 2-1: SerialNumber: a5816185
[ 4236.454634] usb 2-1: USB disconnect, device number 5
[ 4250.335517] usb 2-1: new high-speed USB device number 6 using ehci-platform
[ 4250.514133] usb 2-1: New USB device found, idVendor=2c7c, idProduct=0125, bcdDevice= 3.18
[ 4250.522400] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 4250.529603] usb 2-1: Product: LTE Module
[ 4250.533598] usb 2-1: Manufacturer: Quectel, Incorporated
[ 4250.538970] usb 2-1: SerialNumber: a5816185
[ 4250.563632] option 2-1:1.0: GSM modem (1-port) converter detected
[ 4250.574402] usb 2-1: GSM modem (1-port) converter now attached to ttyUSB0
[ 4250.582029] option 2-1:1.1: GSM modem (1-port) converter detected
[ 4250.588826] usb 2-1: GSM modem (1-port) converter now attached to ttyUSB1
[ 4250.596624] option 2-1:1.2: GSM modem (1-port) converter detected
[ 4250.603403] usb 2-1: GSM modem (1-port) converter now attached to ttyUSB2
[ 4250.614916] option 2-1:1.3: GSM modem (1-port) converter detected
[ 4250.625530] usb 2-1: GSM modem (1-port) converter now attached to ttyUSB3
[ 4250.638009] qmi_wwan 2-1:1.4: cdc-wdm0: USB WDM device
[ 4250.650714] qmi_wwan 2-1:1.4 wwan0: register 'qmi_wwan' at usb-1c1b000.usb-1, WWAN/QMI device, 1a:5a:f8:XX:XX:XX

It seems to crash on me regularly. Is there some way I can monitor inside the modem to see what is going on?

EDIT: It looks like i can adb in and use this command tail -n 50 -f /var/log/{messages,dmesg}

Biktorgj commented 3 years ago

These are all different errors to what we've seen in the past... @kop316 :

@ThePosichronic Yours is by far the strangest one. This message in particular: [115023.278778] usb usb2-port1: disabled by hub (EMI?), re-enabling... (besides all the disconnects and reconnects that are probably due to ModemManager version 1.16.8 and I suppose without disabled suspend-resume hooks)

Maybe the best way to analyze these issues is to get you flashable image with persistent storage enabled so all those logs end up safe in a filesystem after a reboot.