kitakar5525 / surface-ipu3-cameras

19 stars 8 forks source link

surface_camera: unload/re-load sensor drivers not working #6

Closed kitakar5525 closed 3 years ago

kitakar5525 commented 4 years ago

Copying my comment from linux-surface repo https://github.com/linux-surface/linux-surface/issues/91#issuecomment-683881473

[...] "unloading sensor driver, then (re-)loading the driver again" to make sensor driver developing easier without rebooting PC. If surface_camera driver isn't used, I can reload sensor drivers like the following:

# Sensor drivers are used by ipu3_* drivers. Unload them first
sudo modprobe -r ipu3_imgu
sudo modprobe -r ipu3_cio2
sudo rmmod ov5693
sudo rmmod ov7251
sudo rmmod ov8865

sudo modprobe ipu3_cio2
sudo modprobe ipu3_imgu
sudo insmod ov5693.ko
sudo insmod ov7251.ko
sudo insmod ov8865.ko

But when using with surface_camera driver, running the above commands won't re-probe sensor drivers again. No dmesg log shown. When I add sudo rmmod surface_camera at top of the commands, it causes null ptr deref on loading ipu3_cio2:

dmesg log

``` kern :info : [ 5917.885934] ipu3-cio2 0000:00:14.3: device 0x9d32 (rev: 0x1) kern :alert : [ 5917.887666] BUG: unable to handle page fault for address: fffffffffffffff5 kern :alert : [ 5917.887671] #PF: supervisor read access in kernel mode kern :alert : [ 5917.887673] #PF: error_code(0x0000) - not-present page kern :info : [ 5917.887674] PGD 11c013067 P4D 11c013067 PUD 11c015067 PMD 0 kern :warn : [ 5917.887679] Oops: 0000 [#1] PREEMPT SMP PTI kern :warn : [ 5917.887683] CPU: 2 PID: 63674 Comm: modprobe Tainted: G WC OE 5.9.0-rc2-1-surface-mainline #1 kern :warn : [ 5917.887685] Hardware name: Microsoft Corporation Surface Book/Surface Book, BIOS 92.3192.768 03.24.2020 kern :warn : [ 5917.887692] RIP: 0010:fwnode_graph_get_endpoint_by_id+0x72/0x240 kern :warn : [ 5917.887696] Code: 49 89 cc 48 89 fb 41 89 f5 45 31 f6 49 f7 d4 c7 44 24 14 00 00 00 00 48 c7 44 24 08 00 00 00 00 44 89 e0 83 e0 01 88 44 24 12 <48> 8b 43 08 48 85 c0 0f 84 68 01 00 00 48 8b 40 68 48 85 c0 0f 84 kern :warn : [ 5917.887698] RSP: 0018:ffffad90c05ab9d8 EFLAGS: 00010246 kern :warn : [ 5917.887701] RAX: 0000000000000000 RBX: ffffffffffffffed RCX: 0000000000000001 kern :warn : [ 5917.887703] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffed kern :warn : [ 5917.887705] RBP: 0000000000000000 R08: 0000000002000024 R09: ffff95f4a8570e40 kern :warn : [ 5917.887707] R10: 0000000000000000 R11: ffffffffbc51cd60 R12: fffffffffffffffe kern :warn : [ 5917.887709] R13: 0000000000000000 R14: 0000000000000000 R15: ffff95f5fd51c018 kern :warn : [ 5917.887712] FS: 00007f4c155a5740(0000) GS:ffff95f61f500000(0000) knlGS:0000000000000000 kern :warn : [ 5917.887714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kern :warn : [ 5917.887716] CR2: fffffffffffffff5 CR3: 0000000444ed8005 CR4: 00000000003706e0 kern :warn : [ 5917.887718] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kern :warn : [ 5917.887720] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kern :warn : [ 5917.887721] Call Trace: kern :warn : [ 5917.887732] cio2_pci_probe.cold+0x73f/0xa74 [ipu3_cio2] kern :warn : [ 5917.887740] local_pci_probe+0x42/0x80 kern :warn : [ 5917.887744] ? pci_match_device+0xd7/0x100 kern :warn : [ 5917.887751] pci_device_probe+0xfa/0x1b0 kern :warn : [ 5917.887757] really_probe+0x205/0x460 kern :warn : [ 5917.887761] driver_probe_device+0xe1/0x150 kern :warn : [ 5917.887764] device_driver_attach+0xa1/0xb0 kern :warn : [ 5917.887768] __driver_attach+0x8a/0x150 kern :warn : [ 5917.887771] ? device_driver_attach+0xb0/0xb0 kern :warn : [ 5917.887773] ? device_driver_attach+0xb0/0xb0 kern :warn : [ 5917.887777] bus_for_each_dev+0x89/0xd0 kern :warn : [ 5917.887780] bus_add_driver+0x12b/0x1e0 kern :warn : [ 5917.887784] driver_register+0x8b/0xe0 kern :warn : [ 5917.887787] ? 0xffffffffc0bde000 kern :warn : [ 5917.887791] do_one_initcall+0x59/0x234 kern :warn : [ 5917.887796] do_init_module+0x5c/0x260 kern :warn : [ 5917.887799] load_module+0x21a7/0x2450 kern :warn : [ 5917.887807] __do_sys_finit_module+0xbd/0x120 kern :warn : [ 5917.887813] do_syscall_64+0x33/0x40 kern :warn : [ 5917.887817] entry_SYSCALL_64_after_hwframe+0x44/0xa9 kern :warn : [ 5917.887820] RIP: 0033:0x7f4c156ccd5d kern :warn : [ 5917.887824] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 70 0c 00 f7 d8 64 89 01 48 kern :warn : [ 5917.887826] RSP: 002b:00007ffff1450828 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 kern :warn : [ 5917.887830] RAX: ffffffffffffffda RBX: 00005590d38e0020 RCX: 00007f4c156ccd5d kern :warn : [ 5917.887832] RDX: 0000000000000000 RSI: 00005590d1c70368 RDI: 0000000000000007 kern :warn : [ 5917.887834] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000 kern :warn : [ 5917.887836] R10: 0000000000000007 R11: 0000000000000246 R12: 00005590d1c70368 kern :warn : [ 5917.887838] R13: 0000000000000000 R14: 00005590d38dfe00 R15: 00005590d38e0020 kern :warn : [ 5917.887842] Modules linked in: ipu3_cio2(OE+) videobuf2_dma_sg videobuf2_memops videobuf2_v4l2 videobuf2_common iptable_mangle xt_CHECKSUM xt_tcpudp iptable_nat xt_comment xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bridge stp llc iptable_filter usb_storage rfcomm cmac algif_hash algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth ecdh_generic ecc zram msr uinput input_leds mousedev joydev hid_sensor_als hid_sensor_gyro_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf intel_rapl_msr hid_sensor_iio_common industrialio snd_hda_codec_hdmi intel_rapl_common hid_multitouch snd_soc_skl snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi x86_pkg_temp_thermal intel_powerclamp coretemp snd_soc_core snd_hda_codec_realtek kvm_intel snd_hda_codec_generic snd_compress ac97_bus ledtrig_audio snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg mwifiex_pcie(OE) kvm hid_sensor_hub kern :warn : [ 5917.887881] squashfs mwifiex(OE) hid_generic mei_hdcp ipts usbhid snd_hda_codec irqbypass crct10dif_pclmul snd_hda_core nls_iso8859_1 snd_hwdep crc32_pclmul nls_cp437 vfat ghash_clmulni_intel fat aesni_intel snd_pcm cfg80211 crypto_simd fuse snd_timer cryptd glue_helper loop v4l2_fwnode rapl intel_cstate snd intel_uncore pcspkr soundcore mei_me rfkill intel_lpss_pci intel_lpss mei idma64 intel_xhci_usb_role_switch intel_pch_thermal roles i2c_hid videodev hid mc tps68470(OE) tpm_crb surfacepro3_button battery tpm_tis soc_button_array ac tpm_tis_core evdev mac_hid tpm rng_core pkcs8_key_parser sg crypto_user acpi_call(OE) ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 xhci_pci xhci_pci_renesas crc32c_intel xhci_hcd i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm intel_agp intel_gtt agpgart [last unloaded: ov8865] kern :warn : [ 5917.887929] CR2: fffffffffffffff5 kern :warn : [ 5917.887933] ---[ end trace eca5d6da121e5f6f ]--- kern :warn : [ 5917.887938] RIP: 0010:fwnode_graph_get_endpoint_by_id+0x72/0x240 kern :warn : [ 5917.887941] Code: 49 89 cc 48 89 fb 41 89 f5 45 31 f6 49 f7 d4 c7 44 24 14 00 00 00 00 48 c7 44 24 08 00 00 00 00 44 89 e0 83 e0 01 88 44 24 12 <48> 8b 43 08 48 85 c0 0f 84 68 01 00 00 48 8b 40 68 48 85 c0 0f 84 kern :warn : [ 5917.887944] RSP: 0018:ffffad90c05ab9d8 EFLAGS: 00010246 kern :warn : [ 5917.887946] RAX: 0000000000000000 RBX: ffffffffffffffed RCX: 0000000000000001 kern :warn : [ 5917.887948] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffed kern :warn : [ 5917.887950] RBP: 0000000000000000 R08: 0000000002000024 R09: ffff95f4a8570e40 kern :warn : [ 5917.887952] R10: 0000000000000000 R11: ffffffffbc51cd60 R12: fffffffffffffffe kern :warn : [ 5917.887954] R13: 0000000000000000 R14: 0000000000000000 R15: ffff95f5fd51c018 kern :warn : [ 5917.887957] FS: 00007f4c155a5740(0000) GS:ffff95f61f500000(0000) knlGS:0000000000000000 kern :warn : [ 5917.887960] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kern :warn : [ 5917.887962] CR2: fffffffffffffff5 CR3: 0000000444ed8005 CR4: 00000000003706e0 kern :warn : [ 5917.887965] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kern :warn : [ 5917.887967] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ```

djrscally commented 4 years ago

For me, the software node unregistration doesn't work, and that seems to be why I can't unload and reload surface_driver. If you ls /sys/kernel/software_nodes after rmmod surface_camera the directories for the sensors and cio2 are still there

On Sat, 5 Sep 2020, 12:37 Hayataka, notifications@github.com wrote:

Copying my comment from linux-surface repo linux-surface/linux-surface#91 (comment) https://github.com/linux-surface/linux-surface/issues/91#issuecomment-683881473

[...] "unloading sensor driver, then (re-)loading the driver again" to make sensor driver developing easier without rebooting PC. If surface_camera driver isn't used, I can reload sensor drivers like the following:

Sensor drivers are used by ipu3_* drivers. Unload them first

sudo modprobe -r ipu3_imgu sudo modprobe -r ipu3_cio2 sudo rmmod ov5693 sudo rmmod ov7251 sudo rmmod ov8865

sudo modprobe ipu3_cio2 sudo modprobe ipu3_imgu sudo insmod ov5693.ko sudo insmod ov7251.ko sudo insmod ov8865.ko

But when using with surface_camera driver, running the above commands won't re-probe sensor drivers again. No dmesg log shown. When I add sudo rmmod surface_camera at top of the commands, it causes null ptr deref on loading ipu3_cio2: dmesg log

kern :info : [ 5917.885934] ipu3-cio2 0000:00:14.3: device 0x9d32 (rev: 0x1) kern :alert : [ 5917.887666] BUG: unable to handle page fault for address: fffffffffffffff5 kern :alert : [ 5917.887671] #PF: supervisor read access in kernel mode kern :alert : [ 5917.887673] #PF: error_code(0x0000) - not-present page kern :info : [ 5917.887674] PGD 11c013067 P4D 11c013067 PUD 11c015067 PMD 0 kern :warn : [ 5917.887679] Oops: 0000 [#1] PREEMPT SMP PTI kern :warn : [ 5917.887683] CPU: 2 PID: 63674 Comm: modprobe Tainted: G WC OE 5.9.0-rc2-1-surface-mainline #1 kern :warn : [ 5917.887685] Hardware name: Microsoft Corporation Surface Book/Surface Book, BIOS 92.3192.768 03.24.2020 kern :warn : [ 5917.887692] RIP: 0010:fwnode_graph_get_endpoint_by_id+0x72/0x240 kern :warn : [ 5917.887696] Code: 49 89 cc 48 89 fb 41 89 f5 45 31 f6 49 f7 d4 c7 44 24 14 00 00 00 00 48 c7 44 24 08 00 00 00 00 44 89 e0 83 e0 01 88 44 24 12 <48> 8b 43 08 48 85 c0 0f 84 68 01 00 00 48 8b 40 68 48 85 c0 0f 84 kern :warn : [ 5917.887698] RSP: 0018:ffffad90c05ab9d8 EFLAGS: 00010246 kern :warn : [ 5917.887701] RAX: 0000000000000000 RBX: ffffffffffffffed RCX: 0000000000000001 kern :warn : [ 5917.887703] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffed kern :warn : [ 5917.887705] RBP: 0000000000000000 R08: 0000000002000024 R09: ffff95f4a8570e40 kern :warn : [ 5917.887707] R10: 0000000000000000 R11: ffffffffbc51cd60 R12: fffffffffffffffe kern :warn : [ 5917.887709] R13: 0000000000000000 R14: 0000000000000000 R15: ffff95f5fd51c018 kern :warn : [ 5917.887712] FS: 00007f4c155a5740(0000) GS:ffff95f61f500000(0000) knlGS:0000000000000000 kern :warn : [ 5917.887714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kern :warn : [ 5917.887716] CR2: fffffffffffffff5 CR3: 0000000444ed8005 CR4: 00000000003706e0 kern :warn : [ 5917.887718] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kern :warn : [ 5917.887720] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kern :warn : [ 5917.887721] Call Trace: kern :warn : [ 5917.887732] cio2_pci_probe.cold+0x73f/0xa74 [ipu3_cio2] kern :warn : [ 5917.887740] local_pci_probe+0x42/0x80 kern :warn : [ 5917.887744] ? pci_match_device+0xd7/0x100 kern :warn : [ 5917.887751] pci_device_probe+0xfa/0x1b0 kern :warn : [ 5917.887757] really_probe+0x205/0x460 kern :warn : [ 5917.887761] driver_probe_device+0xe1/0x150 kern :warn : [ 5917.887764] device_driver_attach+0xa1/0xb0 kern :warn : [ 5917.887768] driver_attach+0x8a/0x150 kern :warn : [ 5917.887771] ? device_driver_attach+0xb0/0xb0 kern :warn : [ 5917.887773] ? device_driver_attach+0xb0/0xb0 kern :warn : [ 5917.887777] bus_for_each_dev+0x89/0xd0 kern :warn : [ 5917.887780] bus_add_driver+0x12b/0x1e0 kern :warn : [ 5917.887784] driver_register+0x8b/0xe0 kern :warn : [ 5917.887787] ? 0xffffffffc0bde000 kern :warn : [ 5917.887791] do_one_initcall+0x59/0x234 kern :warn : [ 5917.887796] do_init_module+0x5c/0x260 kern :warn : [ 5917.887799] load_module+0x21a7/0x2450 kern :warn : [ 5917.887807] do_sys_finit_module+0xbd/0x120 kern :warn : [ 5917.887813] do_syscall_64+0x33/0x40 kern :warn : [ 5917.887817] entry_SYSCALL_64_after_hwframe+0x44/0xa9 kern :warn : [ 5917.887820] RIP: 0033:0x7f4c156ccd5d kern :warn : [ 5917.887824] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 70 0c 00 f7 d8 64 89 01 48 kern :warn : [ 5917.887826] RSP: 002b:00007ffff1450828 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 kern :warn : [ 5917.887830] RAX: ffffffffffffffda RBX: 00005590d38e0020 RCX: 00007f4c156ccd5d kern :warn : [ 5917.887832] RDX: 0000000000000000 RSI: 00005590d1c70368 RDI: 0000000000000007 kern :warn : [ 5917.887834] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000 kern :warn : [ 5917.887836] R10: 0000000000000007 R11: 0000000000000246 R12: 00005590d1c70368 kern :warn : [ 5917.887838] R13: 0000000000000000 R14: 00005590d38dfe00 R15: 00005590d38e0020 kern :warn : [ 5917.887842] Modules linked in: ipu3_cio2(OE+) videobuf2_dma_sg videobuf2_memops videobuf2_v4l2 videobuf2_common iptable_mangle xt_CHECKSUM xt_tcpudp iptable_nat xt_comment xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bridge stp llc iptable_filter usb_storage rfcomm cmac algif_hash algif_skcipher af_alg bnep btusb btrtl btbcm btintel bluetooth ecdh_generic ecc zram msr uinput input_leds mousedev joydev hid_sensor_als hid_sensor_gyro_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf intel_rapl_msr hid_sensor_iio_common industrialio snd_hda_codec_hdmi intel_rapl_common hid_multitouch snd_soc_skl snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi x86_pkg_temp_thermal intel_powerclamp coretemp snd_soc_core snd_hda_codec_realtek kvm_intel snd_hda_codec_generic snd_compress ac97_bus ledtrig_audio snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg mwifiex_pcie(OE) kvm hid_sensor_hub kern :warn : [ 5917.887881] squashfs mwifiex(OE) hid_generic mei_hdcp ipts usbhid snd_hda_codec irqbypass crct10dif_pclmul snd_hda_core nls_iso8859_1 snd_hwdep crc32_pclmul nls_cp437 vfat ghash_clmulni_intel fat aesni_intel snd_pcm cfg80211 crypto_simd fuse snd_timer cryptd glue_helper loop v4l2_fwnode rapl intel_cstate snd intel_uncore pcspkr soundcore mei_me rfkill intel_lpss_pci intel_lpss mei idma64 intel_xhci_usb_role_switch intel_pch_thermal roles i2c_hid videodev hid mc tps68470(OE) tpm_crb surfacepro3_button battery tpm_tis soc_button_array ac tpm_tis_core evdev mac_hid tpm rng_core pkcs8_key_parser sg crypto_user acpi_call(OE) ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 xhci_pci xhci_pci_renesas crc32c_intel xhci_hcd i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm intel_agp intel_gtt agpgart [last unloaded: ov8865] kern :warn : [ 5917.887929] CR2: fffffffffffffff5 kern :warn : [ 5917.887933] ---[ end trace eca5d6da121e5f6f ]--- kern :warn : [ 5917.887938] RIP: 0010:fwnode_graph_get_endpoint_by_id+0x72/0x240 kern :warn : [ 5917.887941] Code: 49 89 cc 48 89 fb 41 89 f5 45 31 f6 49 f7 d4 c7 44 24 14 00 00 00 00 48 c7 44 24 08 00 00 00 00 44 89 e0 83 e0 01 88 44 24 12 <48> 8b 43 08 48 85 c0 0f 84 68 01 00 00 48 8b 40 68 48 85 c0 0f 84 kern :warn : [ 5917.887944] RSP: 0018:ffffad90c05ab9d8 EFLAGS: 00010246 kern :warn : [ 5917.887946] RAX: 0000000000000000 RBX: ffffffffffffffed RCX: 0000000000000001 kern :warn : [ 5917.887948] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffffffffed kern :warn : [ 5917.887950] RBP: 0000000000000000 R08: 0000000002000024 R09: ffff95f4a8570e40 kern :warn : [ 5917.887952] R10: 0000000000000000 R11: ffffffffbc51cd60 R12: fffffffffffffffe kern :warn : [ 5917.887954] R13: 0000000000000000 R14: 0000000000000000 R15: ffff95f5fd51c018 kern :warn : [ 5917.887957] FS: 00007f4c155a5740(0000) GS:ffff95f61f500000(0000) knlGS:0000000000000000 kern :warn : [ 5917.887960] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kern :warn : [ 5917.887962] CR2: fffffffffffffff5 CR3: 0000000444ed8005 CR4: 00000000003706e0 kern :warn : [ 5917.887965] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kern :warn : [ 5917.887967] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

β€” You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kitakar5525/surface-ipu3-cameras/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDBE27622QOZFQD2CI7Q23SEIPHZANCNFSM4Q2ZKZSQ .

kitakar5525 commented 4 years ago

Thanks for the comment!

After these changes on exit(): https://github.com/kitakar5525/surface-ipu3-cameras/blob/fcea97edd7596fe1dc592dbcef23a5263d1d22c2/surface_camera_from_jhand2/SB1_ov7251/surface_camera.c#L165-L168

I can at least now unregister swnodes:

# before loading surface_camera
$ ls /sys/kernel/software_nodes
intel-xhci-usb-sw  node0  node2  node3  node4

# after loading surface_camera
$ ls /sys/kernel/software_nodes
INT343E  INT347E  intel-xhci-usb-sw  node0  node2  node3  node4

# after unloading surface_camera
$ ls /sys/kernel/software_nodes
intel-xhci-usb-sw  node0  node2  node3  node4

However, re-loading surface_camera fails:

$ sudo insmod surface_camera.ko
insmod: ERROR: could not insert module surface_camera.ko: Unknown error 517

So, there seems to be more leftovers in addition to the swnodes.

djrscally commented 4 years ago

In theory you shouldn't need to do both of these:

 surface_camera_unregister_nodes(); 
 software_node_unregister_nodes(nodes); 

Because surface_camera_unregister_nodes calls surface_camera_remove_node which calls fwnode_remove_software_node

static void surface_camera_remove_node(const struct software_node *n)
{
    struct fwnode_handle *fwnode = software_node_fwnode(n);
    if (fwnode)
        fwnode_remove_software_node(fwnode);
}

static void surface_camera_unregister_nodes(void)
{
    surface_camera_remove_node(&nodes[SWNODE_OV7251_ENDPOINT0]);
    surface_camera_remove_node(&nodes[SWNODE_OV7251_PORT0]);
    surface_camera_remove_node(&nodes[SWNODE_OV7251]);

    surface_camera_remove_node(&nodes[SWNODE_CIO2_ENDPOINT0]);
    surface_camera_remove_node(&nodes[SWNODE_CIO2_PORT0]);
    surface_camera_remove_node(&nodes[SWNODE_CIO2]);
}

And software_node_unregister_nodes just iterates over the nodes and calls software_node_unregister on each of them, which is just calling fwnode_remove_software_node:

/**
 * software_node_unregister - Unregister static software node
 * @node: The software node to be unregistered
 */
void software_node_unregister(const struct software_node *node)
{
    struct swnode *swnode;

    swnode = software_node_to_swnode(node);
    if (swnode)
        fwnode_remove_software_node(&swnode->fwnode);
}

Regarding what might be left over, I wonder if the fwnode_handle references need to be released with fwnode_handle_put. See this example:

static void cht_int33fe_remove_nodes(struct cht_int33fe_data *data)
{
    software_node_unregister_node_group(node_group);

    if (fusb302_mux_refs[0].node) {
        fwnode_handle_put(software_node_fwnode(fusb302_mux_refs[0].node));
        fusb302_mux_refs[0].node = NULL;
    }

    if (data->dp) {
        data->dp->secondary = NULL;
        fwnode_handle_put(data->dp);
        data->dp = NULL;
    }
}
djrscally commented 4 years ago

OK; I tested releasing the fwnode_handle references with fwnode_handle_put. Both that and the call to i2c_unregister_device seem to cause another problem; which is that either method seems to destroy the i2c_client for the sensor device...so the sensor driver no longer works at all, even if you unload and reload the sensor driver (because there's no i2c_client for it to bind to anymore). I guess that's why reprobing the cio2 driver fails for you (in my version of the surface_camera module I switched from i2c_for_each_dev to just directly getting the ACPI device with acpi_dev_get_first_dev, so I never get to the point of reprobing the cio2 device, as of course teh calls to fetch an i2c_client from that acpi_device don't work after unregistration)

So, I guess we need to work out a way to less destructively complete the release of those software_nodes, or a way to release the fwnode_handles without also clearing the i2c_client

djrscally commented 4 years ago

I added this function to drivers/base/swnode.c:

unsigned int software_node_read_refcount(struct software_node *node)
{
    struct swnode *swnode = software_node_to_swnode(node);
    unsigned int ref;

    if (!swnode)
        return -ENODEV;

    ref = kref_read(&swnode->kobj.kref);

    return ref;
}
EXPORT_SYMBOL_GPL(software_node_read_refcount);

And that lets us keep track of the references to the swnodes. My unregistration function:

static int surface_camera_unregister_sensors(void)
{
    int i,j;
    unsigned int ref;
    struct sensor *sensor;
    struct i2c_client *client;
    struct swnode *swnode;

    for (i=0; i < cdevs.n_devices; i++) {

        sensor = &cdevs.sensors[i];
        client = to_i2c_client(sensor->dev);

        i2c_unregister_device(client);

        for (j=4; j>=0; j--) {
            ref = software_node_read_refcount(&sensor->swnodes[j]);
            pr_info("unregistering %s, which currently has %d references\n", sensor->swnodes[j].name, ref);
            software_node_unregister(&sensor->swnodes[j]);
            ref = software_node_read_refcount(&sensor->swnodes[j]);
            pr_info("software node %s has %d references left\n", sensor->swnodes[j].name, ref);
        }
    }
    return 0;
}

reports this:

[  152.129389] unregistering endpoint0, which currently has 1 references
[  152.129424] software node endpoint0 unregistered successfully
[  152.129428] unregistering port1, which currently has 4 references
[  152.129430] software node port1 has 3 references left
[  152.129433] unregistering endpoint0, which currently has 1 references
[  152.129458] software node endpoint0 unregistered successfully
[  152.129461] unregistering port0, which currently has 1 references
[  152.129486] software node port0 unregistered successfully
[  152.129488] unregistering OVTI2680, which currently has 1 references
[  152.129516] software node OVTI2680 unregistered successfully

So port1 (a child of INT343E) is the software_node that seems to prevent the unregistration from working cleanly. I added further debug prints to the registration function, and the tl;dr is that calling device_reprobe on the cio2 device adds 3 references to the count. Unfortunately though, calling sudo modprobe -r ipu3-cio2 doesn't clear those references:

[  387.079913] unregistering port1, which currently has 4 references
[  387.079916] software node port1 has 3 references left

So I wonder if there's actually some changes we need to make to ipu3-cio2.c, such that it releases those references properly.

jhand2 commented 4 years ago

Ya, unfortunately I've been resorting to rebooting my machine whenever I make a kernel change (I wrote a script which rebuilds, reinstalls, and reboots, but it still sucks to wait). I also assume the cio2 driver is not properly releasing references, but I haven't looked into it yet.

djrscally commented 4 years ago

OK, there's two main culprits:

  1. The ipu3-cio2 module doesn't release any references to the ports at all when it's removed.
  2. The patch from Heikki adding the missing software_node_ops has a bug in software_node_graph_get_next_endpoint(). As part of that function it gets references to each port in turn, but doesn't put any of them; so the references tick up by one for each iteration of this loop in cio2_parse_firmware:
static int cio2_parse_firmware(struct cio2_device *cio2)
{
    unsigned int i;
    int ret;

    for (i = 0; i < CIO2_NUM_PORTS; i++) {
        struct v4l2_fwnode_endpoint vep = {
            .bus_type = V4L2_MBUS_CSI2_DPHY
        };
        struct sensor_async_subdev *s_asd = NULL;
        struct fwnode_handle *ep;

        ep = fwnode_graph_get_endpoint_by_id(
            dev_fwnode(&cio2->pci_dev->dev), i, 0,
            FWNODE_GRAPH_ENDPOINT_NEXT);

        < snip >

We can fix that by applying this (on top of Heikki's patch):

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index 3667467196f0..62a1e3de8cb3 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -584,7 +584,9 @@ software_node_graph_get_next_endpoint(const struct fwnode_handle *fwnode,
                endpoint = software_node_get_next_child(port, old);
                fwnode_handle_put(old);
                if (endpoint)
-                       break;          
+                       break;
+               else
+                       fwnode_handle_put(port);
        }

        fwnode_handle_put(port);

Still seems to work ok for me (camera still functions and so on - if you guys could test too that'd be great. I don't think Heikki's patch has gone through yet, so I'll just point this out on linux-media and see what they say.

Will try and get the ipu3-cio2 module to clear the last references on remove tomorrow.

kbingham commented 3 years ago

Sounds good progress to me, and having users and testers should help push Heikki's patches forwards for integration too.

djrscally commented 3 years ago

OK, I got this now. Last remaining references to ipu3_cio2 turned out not to be a problem; the last problem needing fixing was that we overwrite thepci_dev's and i2c_client->dev's default fwnode_handle with our software_nodes, and without restoring those on exiting our module it seems like things won't hook back up properly the next time we load.

The solution I came up with was simply to store the original fwnode_handles and restore those pointers on exit, which is...a little hackish. Not sure how much I like it. So for example to store the original fwnode_handle for the cio2 device:

    /* struct bridge declaration */
struct cio2_bridge {
    int n_sensors;
    struct sensor sensors[MAX_CONNECTED_DEVICES];
    struct pci_dev *cio2;
    struct fwnode_handle *cio2_fwnode;
};

    /* Find pci device and add swnode as primary */
    bridge.cio2 = pci_get_device(PCI_VENDOR_ID_INTEL, CIO2_PCI_ID, NULL);
    if (!bridge.cio2) {
        ret = -EPROBE_DEFER;
        goto out;
    }

    fwnode = software_node_fwnode(&cio2_hid_node);
    if (!fwnode) {
        pr_err("Error getting fwnode from cio2 software_node\n");
        ret = -ENODEV;
        goto out;
    }

    /*
     * We store the pci_dev's existing fwnode, beccause in the event we want 
     * to reload (I.E. rmmod and insmod) this module we need to give the device
     * its original fwnode back to prevent problems down the line
     */

    bridge.cio2_fwnode = bridge.cio2->dev.fwnode;

    fwnode->secondary = ERR_PTR(-ENODEV);
    set_primary_fwnode(&bridge.cio2->dev, fwnode);

and then it's restored during the module's exit function:

static void surface_camera_exit(void)
{
    int ret;

    /* Give the pci_dev its original fwnode back */
    if (bridge.cio2) {
        bridge.cio2->dev.fwnode = bridge.cio2_fwnode;
        pci_dev_put(bridge.cio2);
    }

    ret = surface_camera_unregister_sensors();

    if (ret)
        pr_err("An error occurred unregistering the sensors\n");

    software_node_unregister(&cio2_hid_node);

}

I'm doing the same thing for each sensor that's found; I pushed the latest version of the code to my git repo if you want to review it all. What do you guys think to that method? Is there a better way we could do it? Maybe instead of storing, we should just put the reference to the original fwnode and then set it to ERR_PTR(-ENODEV) on release or something?

Anyway, with the change to Heikki's patch I mentioned before and the method outlined above, I can unload and reload ipu3_cio2 and surface_camera now:

$ sudo modprobe -r ipu3_cio2
$ sudo rmmod surface_camera.ko
$ sudo modprobe ipu3_cio2
$ sudo insmod surface_camera.ko

and qcam continues to stream video fine. Hurrah.

I'm going back to the other bug now :)

kitakar5525 commented 3 years ago

Thank you everyone for your effort!

Anyway, with the change to Heikki's patch I mentioned before and the method outlined above, I can unload and reload ipu3_cio2 and surface_camera now:

I'm glad to hear thatπŸŽ‰

Sorry I've been inactive these days because I'm exhausted from my personal life. I hope I can look into this soon and confirm the fix.

djrscally commented 3 years ago

Thank you everyone for your effort!

Anyway, with the change to Heikki's patch I mentioned before and the method outlined above, I can unload and reload ipu3_cio2 and surface_camera now:

I'm glad to hear thattada

Sorry I've been inactive these days because I'm exhausted from my personal life. I hope I can look into this soon and confirm the fix.

Don't worry dude! Hope all is well :)

kitakar5525 commented 3 years ago

@djrscally Hmm... I tried a little bit with your surface_camera driver with the 4th swnode patch, but INT343E still remains under /sys/kernel/software_nodes after unloading surface_camera driver.

After unloading, when I try to re-load the surface_camera driver, the following kernel oops will appear:

kern  :warn  : [  169.949155] sysfs: cannot create duplicate filename '/kernel/software_nodes/INT343E'
kern  :warn  : [  169.949163] CPU: 0 PID: 4286 Comm: insmod Tainted: G         C OE     5.9.0-rc4-1-surface-mainline-06735-gdceb2b238c03 #12
kern  :warn  : [  169.949164] Hardware name: Microsoft Corporation Surface Book/Surface Book, BIOS 92.3192.768 03.24.2020
kern  :warn  : [  169.949165] Call Trace:
kern  :warn  : [  169.949188]  dump_stack+0x6b/0x88
kern  :warn  : [  169.949194]  sysfs_warn_dup.cold+0x17/0x2d
kern  :warn  : [  169.949196]  sysfs_create_dir_ns+0xb6/0xd0
kern  :warn  : [  169.949200]  kobject_add_internal+0xab/0x2f0
kern  :warn  : [  169.949203]  kobject_init_and_add+0x71/0xa0
kern  :warn  : [  169.949215]  swnode_register+0x10d/0x1f0
kern  :warn  : [  169.949219]  software_node_register+0x31/0x50
kern  :warn  : [  169.949222]  ? 0xffffffffc0904000
kern  :warn  : [  169.949226]  surface_camera_init+0x2f/0xa4 [surface_camera]
kern  :warn  : [  169.949229]  ? 0xffffffffc0904000
kern  :warn  : [  169.949235]  do_one_initcall+0x46/0x224
kern  :warn  : [  169.949239]  ? do_init_module+0x23/0x260
kern  :warn  : [  169.949249]  ? kmem_cache_alloc_trace+0xfe/0x240
kern  :warn  : [  169.949252]  do_init_module+0x5c/0x260
kern  :warn  : [  169.949256]  __do_sys_finit_module+0xac/0x110
kern  :warn  : [  169.949273]  do_syscall_64+0x33/0x40
kern  :warn  : [  169.949278]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kern  :warn  : [  169.949281] RIP: 0033:0x7f66b70aad5d
kern  :warn  : [  169.949285] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 70 0c 00 f7 d8 64 89 01 48
kern  :warn  : [  169.949286] RSP: 002b:00007ffc95e08448 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
kern  :warn  : [  169.949288] RAX: ffffffffffffffda RBX: 000055ad4432e790 RCX: 00007f66b70aad5d
kern  :warn  : [  169.949289] RDX: 0000000000000000 RSI: 000055ad441ea288 RDI: 0000000000000003
kern  :warn  : [  169.949289] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f66b71763e0
kern  :warn  : [  169.949290] R10: 0000000000000003 R11: 0000000000000246 R12: 000055ad441ea288
kern  :warn  : [  169.949290] R13: 0000000000000000 R14: 000055ad443313f0 R15: 0000000000000000
kern  :err   : [  169.949294] kobject_add_internal failed for INT343E with -EEXIST, don't try to register things with the same name in the same directory.
kern  :err   : [  169.949298] Failed to register the CIO2 HID node
djrscally commented 3 years ago

Humm, does /sys/software_nodes/INT343E/portX still exist? Either port0 or whatever your sensor is sat on? Ooh; and are you definitely unloading ipu3_cio2 before surface_camera?

kitakar5525 commented 3 years ago

Yes, like the following:

$ tree /sys/kernel/software_nodes
/sys/kernel/software_nodes
β”œβ”€β”€ INT343E
β”‚Β Β  β”œβ”€β”€ port0
β”‚Β Β  β”œβ”€β”€ port1
β”‚Β Β  └── port2
[...]

This is observed on SB1. I'll try on Switch Alpha 12 now, might make a difference?

and are you definitely unloading ipu3_cio2 before surface_camera?

Yes, the same result regardless of the order.

djrscally commented 3 years ago

Ugh, I think I know what's happened. I was rushing to get it all uploaded and I missed a line out of the commit; my bad. I'll push a fixed version in a moment.

kitakar5525 commented 3 years ago

No worries, take your time! (also seems that qcam isn't working with the current version of surface_camera?)

EDIT: the same result on SA12, too.

djrscally commented 3 years ago

I ended up deciding I didn't like how I'd fixed it and re-wrote it :P - the problem was I introduced a new get with software_node_get_parent() to stop software_node_get_next_endpoint() from passing endpoints that don't belong to the port that is also passed, and that wasn't being released...I fixed last night but then forgot to include that in the patch. Anyway; I pushed a new version of the 0004 patch that does it properly (I think...)

Full function looks like this if you'd rather just copy-paste:

static struct fwnode_handle *
software_node_graph_get_next_endpoint(const struct fwnode_handle *fwnode,
                      struct fwnode_handle *endpoint)
{
    struct swnode *swnode = to_swnode(fwnode);
    struct fwnode_handle *old = endpoint;
    struct fwnode_handle *parent_of_old;
    struct fwnode_handle *parent;
    struct fwnode_handle *port;

    if (!swnode)
        return NULL;

    if (endpoint) {
        port = software_node_get_parent(endpoint);
        parent = software_node_get_parent(port);
    } else {
        parent = software_node_get_named_child_node(fwnode, "ports");
        if (!parent)
            parent = software_node_get(&swnode->fwnode);

        port = swnode_graph_find_next_port(parent, NULL);
    }

    for (; port; port = swnode_graph_find_next_port(parent, port)) {

        if (old) {
            parent_of_old = software_node_get_parent(old);

            if (parent_of_old != port)
                old = NULL;

            fwnode_handle_put(parent_of_old);
        }

        endpoint = software_node_get_next_child(port, old);
        fwnode_handle_put(old);
        if (endpoint)
            break;
        else
            fwnode_handle_put(port);
    }

    fwnode_handle_put(port);
    software_node_put(parent);

    return endpoint;
}

No worries, take your time! (also seems that qcam isn't working with the current version of surface_camera?)

EDIT: the same result on SA12, too.

Oh no! I expected that to work at least :( What does dmesg tell you?

kitakar5525 commented 3 years ago

Oh no! I expected that to work at least :( What does dmesg tell you?

No dmesg log at all. Also, when working, qcam outputs fps. But this time no output from qcam. So, just the stream never starts.

djrscally commented 3 years ago

Hummmmm ok, I guess that's going to be a problem with the parsing of details from SSDB then. Jordan's original module works for all your sensors individually?

EDIT: Does cam -l list all the cameras at least?

kitakar5525 commented 3 years ago

Hummmmm ok, I guess that's going to be a problem with the parsing of details from SSDB then.

qcam is working for your device?

Jordan's original module works for all your sensors individually?

Yes, worked.

EDIT: Does cam -l list all the cameras at least?

Yes, enumerates all the available cameras.

djrscally commented 3 years ago

qcam is working for your device?

Yeah. Do any of them work for you? Note that you can't switch camera in qcam; you have to close it, start it again and select a different camera on start-up (not sure why that is yet)

Yes, enumerates all the available cameras.

OK cool; definitely something to do with the properties entries then, at least the fwnode linking is working!

kitakar5525 commented 3 years ago

Ah sorry! Rebooted and tried again, now qcam shows images. My changes might broke something.

djrscally commented 3 years ago

Ah sorry! Rebooted and tried again, now qcam shows images. My changes might broke something.

Phew!!! So it's working for all three sensors, as long as you close and reopen qcam?

If yes; I'm pretty dang happy with that :)

kitakar5525 commented 3 years ago

So it's working for all three sensors, as long as you close and reopen qcam?

Actually, only ov5693 is functional yet for Surface devices lol I'll try to make other sensors functional next.

and SA12 has only one ipu3 sensor (the other is USB)

djrscally commented 3 years ago

So it's working for all three sensors, as long as you close and reopen qcam?

Actually, only ov5693 is functional yet for Surface devices lol I'll try to make other sensors functional next.

Ah! Well that's ok - I mean as long as the sensor that worked before also works with this module then it's auto-discovering and linking as it is supposed to do, so I'm happy still.

and SA12 has only one ipu3 sensor (the other is USB)

Weird!

Did that change to the 0004 patch let you unload / reload too?

kitakar5525 commented 3 years ago

Ah! Well that's ok - I mean as long as the sensor that worked before also works with this module then it's auto-discovering and linking as it is supposed to do, so I'm happy still.

Yeah, there is no clue that other sensors won't work with current surface_camera.

and SA12 has only one ipu3 sensor (the other is USB)

Weird!

The user-facing one is USB. So, for Linux compatibility?

Did that change to the 0004 patch let you unload / reload too?

OK, I'll try now!

kitakar5525 commented 3 years ago

Yes, now qcam works after reload like the following! (confirmed on SA12)

sudo modprobe -r ipu3_imgu
sudo modprobe -r ipu3_cio2
sudo rmmod ov5670
sudo rmmod surface_camera

sudo modprobe ipu3_cio2 
sudo modprobe ipu3_imgu
sudo insmod ov5670.ko
sudo insmod surface_camera.ko

but there are kernel oops after re-loading surface_camera (harmless as qcam still works):

kern  :info  : [  238.838228] Found supported device INT3479
kern  :warn  : [  238.839826] ------------[ cut here ]------------
kern  :warn  : [  238.839846] WARNING: CPU: 2 PID: 3694 at drivers/base/core.c:4274 set_primary_fwnode+0x64/0x70
kern  :warn  : [  238.839848] Modules linked in: surface_camera(OE+) ov5670(OE) ipu3_imgu(C) ipu3_cio2 videobuf2_dma_sg fuse ccm rfcomm lzo_rle zram cmac algif_hash algif_skcipher af_alg bnep nls_iso8859_1 nls_cp437 vfat fat btusb btrtl btbcm btintel bluetooth rtsx_usb_sdmmc mmc_core rtsx_usb_ms memstick ecdh_generic ecc uvcvideo rtsx_usb videobuf2_vmalloc hid_sensor_magn_3d hid_sensor_als hid_sensor_rotation hid_sensor_gyro_3d hid_sensor_incl_3d hid_sensor_accel_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio hid_sensor_hub intel_ishtp_loader intel_ishtp_hid cros_ec_ishtp cros_ec snd_soc_skl x86_pkg_temp_thermal intel_powerclamp coretemp snd_soc_sst_ipc snd_soc_sst_dsp kvm_intel snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi ath10k_pci snd_soc_core ath10k_core snd_hda_codec_hdmi kvm ath snd_compress snd_hda_codec_realtek ac97_bus snd_hda_codec_generic iTCO_wdt ledtrig_audio snd_pcm_dmaengine joydev intel_pmc_bxt mousedev iTCO_vendor_support
kern  :warn  : [  238.839946]  snd_hda_intel mei_hdcp irqbypass snd_intel_dspcfg crct10dif_pclmul snd_hda_codec mac80211 crc32_pclmul intel_rapl_msr ghash_clmulni_intel intel_wmi_thunderbolt wmi_bmof snd_hda_core aesni_intel acer_wmi crypto_simd cryptd glue_helper snd_hwdep rapl cfg80211 snd_pcm intel_cstate snd_timer snd pcspkr input_leds intel_uncore i2c_i801 rfkill soundcore i2c_smbus libarc4 mei_me intel_lpss_pci mei intel_xhci_usb_role_switch intel_lpss idma64 videobuf2_memops roles videobuf2_v4l2 processor_thermal_device intel_ish_ipc intel_rapl_common intel_pch_thermal tpm_crb videobuf2_common intel_ishtp intel_soc_dts_iosf v4l2_fwnode i2c_hid tpm_tis tpm_tis_core int3403_thermal videodev int340x_thermal_zone wmi tpm evdev tps68470 mac_hid mc rng_core battery ac intel_vbtn int3400_thermal acpi_thermal_rel sparse_keymap sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_multitouch uas usb_storage hid_generic usbhid hid serio_raw atkbd libps2 xhci_pci crc32c_intel
kern  :warn  : [  238.840051]  xhci_pci_renesas xhci_hcd i8042 serio i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm intel_agp intel_gtt agpgart [last unloaded: ov5670]
kern  :warn  : [  238.840084] CPU: 2 PID: 3694 Comm: insmod Tainted: G         C OE     5.9.0-rc4-1-surface-mainline-06735-gdceb2b238c03 #13
kern  :warn  : [  238.840087] Hardware name: Acer Switch SA5-271/Hawaii, BIOS V1.06 04/16/2018
kern  :warn  : [  238.840098] RIP: 0010:set_primary_fwnode+0x64/0x70
kern  :warn  : [  238.840107] Code: c0 74 1b 48 8b 10 48 81 fa 00 f0 ff ff 77 0f 48 89 97 98 02 00 00 48 c7 00 00 00 00 00 c3 48 c7 87 98 02 00 00 00 00 00 00 c3 <0f> 0b eb c5 48 89 c2 eb ba 0f 1f 00 0f 1f 44 00 00 48 8b 46 20 48
kern  :warn  : [  238.840112] RSP: 0018:ffffa96680f47d70 EFLAGS: 00010286
kern  :warn  : [  238.840119] RAX: ffff8f09e4d2e810 RBX: ffffffffc0b4a778 RCX: 0000000000000000
kern  :warn  : [  238.840122] RDX: ffff8f0992314b88 RSI: ffff8f09b155bdc8 RDI: ffff8f09cd0be420
kern  :warn  : [  238.840126] RBP: ffff8f09cd0be420 R08: 0000000000000000 R09: ffffa96680f47bd8
kern  :warn  : [  238.840130] R10: 0000000000000000 R11: ffffffff8fd1cce0 R12: ffffffffc0b4a189
kern  :warn  : [  238.840136] R13: 0000000500000002 R14: ffff8f09b155bdc8 R15: ffffffffc0b4b490
kern  :warn  : [  238.840142] FS:  00007f05f2fd6740(0000) GS:ffff8f09e6d00000(0000) knlGS:0000000000000000
kern  :warn  : [  238.840147] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kern  :warn  : [  238.840152] CR2: 00005604bc62a3f8 CR3: 00000001e60c8006 CR4: 00000000003706e0
kern  :warn  : [  238.840155] Call Trace:
kern  :warn  : [  238.840172]  surface_camera_init.cold+0x3b2/0xd7b [surface_camera]
kern  :warn  : [  238.840192]  ? surface_camera_exit+0xa0/0xa0 [surface_camera]
kern  :warn  : [  238.840206]  do_one_initcall+0x46/0x224
kern  :warn  : [  238.840215]  ? do_init_module+0x23/0x260
kern  :warn  : [  238.840224]  ? kmem_cache_alloc_trace+0xfe/0x240
kern  :warn  : [  238.840234]  do_init_module+0x5c/0x260
kern  :warn  : [  238.840244]  __do_sys_finit_module+0xac/0x110
kern  :warn  : [  238.840264]  do_syscall_64+0x33/0x40
kern  :warn  : [  238.840275]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kern  :warn  : [  238.840282] RIP: 0033:0x7f05f30fdd5d
kern  :warn  : [  238.840291] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 70 0c 00 f7 d8 64 89 01 48
kern  :warn  : [  238.840296] RSP: 002b:00007ffe1038d428 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
kern  :warn  : [  238.840304] RAX: ffffffffffffffda RBX: 00005604bc626750 RCX: 00007f05f30fdd5d
kern  :warn  : [  238.840308] RDX: 0000000000000000 RSI: 00005604bc2ae368 RDI: 0000000000000003
kern  :warn  : [  238.840311] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f05f31c93e0
kern  :warn  : [  238.840315] R10: 0000000000000003 R11: 0000000000000246 R12: 00005604bc2ae368
kern  :warn  : [  238.840320] R13: 0000000000000000 R14: 00005604bc6293f0 R15: 0000000000000000
kern  :warn  : [  238.840333] ---[ end trace 3b7c8b491c9bbf73 ]---
kern  :info  : [  238.840338] Found 1 supported devices
kern  :info  : [  238.893813] ipu3-cio2 0000:00:14.3: device 0x9d32 (rev: 0x1)
kern  :info  : [  238.894763] ipu3-cio2 0000:00:14.3: cio2_parse_firmware(), CIO2 port 0 available: csi2.port: 0 csi2.lanes: 2
kern  :info  : [  238.894775] ipu3-cio2 0000:00:14.3: cio2_parse_firmware(), endpoint not available for CIO2 port 1
kern  :info  : [  238.894783] ipu3-cio2 0000:00:14.3: cio2_parse_firmware(), endpoint not available for CIO2 port 2
kern  :info  : [  238.894792] ipu3-cio2 0000:00:14.3: cio2_parse_firmware(), endpoint not available for CIO2 port 3
djrscally commented 3 years ago

Ah! Ok, I'll look at that this evening if I get time; at least the reloading is playing ball now

kitakar5525 commented 3 years ago

By the way, the latest surface_camera can be unloaded even before ipu3-cio2πŸŽ‰ like the following (SB1):

# reload camera drivers
sudo rmmod surface_camera
sudo modprobe -r ipu3_imgu
sudo modprobe -r ipu3_cio2
sudo rmmod ov5693
sudo rmmod ov7251
sudo rmmod ov8865

sudo modprobe ipu3_cio2
sudo modprobe ipu3_imgu
sudo insmod ov5693_from_jhand2/ov5693.ko
sudo insmod ov7251/ov7251.ko
sudo insmod ov8865/ov8865.ko
sudo insmod surface_camera.ko
djrscally commented 3 years ago

OK, that oops should be fixed by the latest commit - the set_primary_fwnode() helper function does a bit more than just dev->fwnode = fwnode, and that was causing the problems.

I think we're error free now...

kitakar5525 commented 3 years ago

OK, that oops should be fixed by the latest commit - the set_primary_fwnode() helper function does a bit more than just dev->fwnode = fwnode, and that was causing the problems.

I think we're error free now...

Yes, I can confirm that we're error free now! Thanks, closing this issue.

kitakar5525 commented 3 years ago

For the record, reload command.

You can unload cio2-bridge (formerly surface_camera) driver at either first or last:

# reload camera drivers
sudo modprobe -r cio2-bridge
sudo modprobe -r ipu3_imgu
sudo modprobe -r ipu3_cio2
sudo rmmod ov5693
sudo rmmod ov7251
sudo rmmod ov8865
#sudo modprobe -r cio2-bridge

sudo modprobe ipu3_cio2
sudo modprobe ipu3_imgu
sudo modprobe ov5693
sudo modprobe ov7251
sudo modprobe ov8865
sudo modprobe cio2-bridge
# reload camera drivers
#sudo modprobe -r cio2-bridge
sudo modprobe -r ipu3_imgu
sudo modprobe -r ipu3_cio2
sudo rmmod ov5693
sudo rmmod ov7251
sudo rmmod ov8865
sudo modprobe -r cio2-bridge

sudo modprobe ipu3_cio2
sudo modprobe ipu3_imgu
sudo modprobe ov5693
sudo modprobe ov7251
sudo modprobe ov8865
sudo modprobe cio2-bridge