ljalves / linux_media

TBS linux open source drivers
https://github.com/ljalves/linux_media/wiki
Other
89 stars 150 forks source link

TBS 6981 & IOMMU problems #62

Closed RayCic closed 9 years ago

RayCic commented 9 years ago

After kernel upgrade 3.13 => 3.17 and switching to your code base (because I needed support for TBS 6285) I started to receive different IOMMU related problems:

Problem 1:

[ 5882.197852] ------------[ cut here ]------------
[ 5882.197870] WARNING: CPU: 0 PID: 13204 at drivers/iommu/amd_iommu.c:2625 dma_ops_domain_unmap.part.9+0x4d/0x56()
[ 5882.197875] Modules linked in: ip6table_filter ip6_tables act_police cls_basic cls_flow cls_fw cls_u32 sch_fq_codel sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq xt_CHECKSUM ipt_rpfilter xt_statistic xt_CT xt_realm xt_addrtype xt_nat ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set nf_nat_ftp xt_time xt_TCPMSS xt_tcpmss xt_policy xt_pkttype xt_physdev xt_NFQUEUE xt_NFLOG xt_mark xt_mac xt_length xt_helper xt_hashlimit xt_DSCP xt_dscp xt_CLASSIFY xt_AUDIT iptable_raw iptable_nat nf_nat_ipv4 nf_nat iptable_mangle hwmon_vid bridge stp llc ipv6 cx25840(O) snd_usb_audio snd_hwdep uvcvideo(O) snd_usbmidi_lib snd_rawmidi videobuf2_vmalloc(O) ir_sony_decoder(O) ir_xmp_decoder(O) ir_lirc_codec(O) lirc_dev(O) ir_mce_kbd_decoder(O) ir_sharp_decoder(O) ir_sanyo_decoder(O) ir_rc6_decoder(O) ir_jvc_decoder(O)
[ 5882.197960]  ir_rc5_decoder(O) ir_nec_decoder(O) rc_rc6_mce(O) mceusb(O) k10temp microcode si2157(O) sp5100_tco i2c_piix4 si2168(O) saa716x_budget(O) tas2101(O) cxd2820r(O) mb86a16(O) cx24117(O) saa716x_core(O) snd_hda_codec_hdmi cx23885(O) altera_ci(O) tda18271(O) altera_stapl(O) videobuf2_dvb(O) videobuf2_dma_sg(O) videobuf2_memops(O) tveeprom(O) cx2341x(O) dvb_core(O) videobuf2_core(O) v4l2_common(O) nouveau videodev(O) media(O) snd_hda_intel ttm rc_core(O) snd_hda_controller drm_kms_helper snd_hda_codec r8169 mii
[ 5882.198014] CPU: 0 PID: 13204 Comm: w_scan Tainted: G           O   3.17.7-hardened-r1-myrc03-nosec #1
[ 5882.198019] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 LE R2.0, BIOS 2501 04/09/2014
[ 5882.198023]  0000000000000000 0000000000000009 ffffffffbc66fdba 0000000000000000
[ 5882.198029]  ffffffffbc0bcb26 0000000000000000 ffffffffbc511306 ffff88009199fb6c
[ 5882.198035]  ffff8800b87c50e0 0000000004e1c000 0000000004e1c000 00000000000001f8
[ 5882.198041] Call Trace:
[ 5882.198052]  [<ffffffffbc66fdba>] ? dump_stack+0x41/0x51
[ 5882.198061]  [<ffffffffbc0bcb26>] ? warn_slowpath_common+0x73/0x8b
[ 5882.198067]  [<ffffffffbc511306>] ? dma_ops_domain_unmap.part.9+0x4d/0x56
[ 5882.198073]  [<ffffffffbc511306>] ? dma_ops_domain_unmap.part.9+0x4d/0x56
[ 5882.198079]  [<ffffffffbc512227>] ? __unmap_single.isra.18+0x78/0xd0
[ 5882.198085]  [<ffffffffbc512f94>] ? free_coherent+0x46/0x7f
[ 5882.198108]  [<ffffffffc041dc97>] ? vb2_ioctl_expbuf+0x4fe/0x2a27 [videobuf2_core]
[ 5882.198125]  [<ffffffffc041af53>] ? __vb2_queue_cancel+0x1b1/0x1d8 [videobuf2_core]
[ 5882.198141]  [<ffffffffc041b091>] ? __reqbufs+0x117/0x344 [videobuf2_core]
[ 5882.198157]  [<ffffffffc041bbbd>] ? vb2_thread_stop+0x104/0x150 [videobuf2_core]
[ 5882.198172]  [<ffffffffc0469555>] ? vb2_dvb_stop_feed+0x40/0x57 [videobuf2_dvb]
[ 5882.198190]  [<ffffffffc042bb77>] ? dmx_section_feed_stop_filtering+0x41/0x7f [dvb_core]
[ 5882.198207]  [<ffffffffc042a54c>] ? dvb_dmxdev_feed_stop+0x52/0x8b [dvb_core]
[ 5882.198223]  [<ffffffffc042a68e>] ? dvb_dmxdev_filter_stop+0x35/0xb6 [dvb_core]
[ 5882.198240]  [<ffffffffc042accd>] ? dvb_demux_do_ioctl+0x121/0x4dd [dvb_core]
[ 5882.198257]  [<ffffffffc04297a4>] ? dvb_usercopy+0x167/0x2a2 [dvb_core]
[ 5882.198265]  [<ffffffffbc0dca9f>] ? put_prev_entity+0x34/0x1db
[ 5882.198283]  [<ffffffffc0433e77>] ? dvb_ringbuffer_read_user+0x204/0x24e [dvb_core]
[ 5882.198290]  [<ffffffffbc0dbabc>] ? __dequeue_entity+0x18/0x2c
[ 5882.198307]  [<ffffffffc042abac>] ? dvb_demux_release+0x141/0x141 [dvb_core]
[ 5882.198313]  [<ffffffffbc0dbae4>] ? set_next_entity+0x14/0x36
[ 5882.198330]  [<ffffffffc042a22a>] ? dvb_dmxdev_buffer_read.isra.2+0xf8/0x14b [dvb_core]
[ 5882.198347]  [<ffffffffc0429cd8>] ? dvb_demux_ioctl+0xe/0x13 [dvb_core]
[ 5882.198354]  [<ffffffffbc16e839>] ? do_vfs_ioctl+0x5f1/0x63f
[ 5882.198363]  [<ffffffffbc0f7c71>] ? ktime_get_ts64+0x4b/0xb6
[ 5882.198369]  [<ffffffffbc16f7c0>] ? poll_select_set_timeout+0x4e/0x6f
[ 5882.198375]  [<ffffffffbc16e8ba>] ? SyS_ioctl+0x33/0x59
[ 5882.198382]  [<ffffffffbc67547e>] ? system_call_fastpath+0x16/0x1b
[ 5882.198387] ---[ end trace 32a59f538eccc69f ]---

Problem 2:

AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x001c address=0x0000000001355000 flags=0x0000]

Thoughts about above problems:

P.S. I tried to take out TBS 6285 - problem get worse: second front end stopped responding. When I put card back problem with second front end disappeared. (F*king magic)

RayCic commented 9 years ago

Upgrade to kernel 3.18.1 did not help...

RayCic commented 9 years ago

After upgrade to https://github.com/bas-t/saa716x-intree problems still remains. So this means problem is upstream

ljalves commented 9 years ago

@RayCic I think that @bas-t tree is a kind of branch from this one (at least the saa716x) so it doesn't mean that the saa716x driver isn't the problem.

ljalves commented 9 years ago

Anyway, I'm closing this one since we are discussing this issue on #66

RayCic commented 9 years ago

Sorry for touching this issue again. It looks that linux-media mailing list is not friendliest list on the Earth, so I should investigate this problem by my self. I am trying to make investigation plan for weekend and would like to hear any comments and suggestions.

1) Naive kernel bisection - just build latest kernel from each suspected kernel version: 3.14, 3.15, 3.16 and see in which version driver broke. With this I hope to minimize next step - linux-media tree bisection. Problem: this kernels may also include IOMMU subsystem regression I hit. In this case this step is useless.

Is this step useful? Or should I do only linux-media tree bisection? IMHO 1 kernel iteration = 2-3 linux-media tree bisection iterations.

2) linux-media tree bisection Problems: 1) I almost do not have experience with git and have NO experience with tree bisection 2) I am afraid of commits count from kernel version 3.13 to HEAD: ~80000 But on other hand it is ~17 iterations . And if in worst case iteration will take half hour - it is whole day. 3) should I limit bisection to the path "drivers/media" or better full blown search? 4) because there is two different error messages should I do two bisection sessions?

My bisection algorithm:

  1. Install & compile last known good kernel (3.13.something)
  2. Cold reboot
  3. Test driver is working
  4. mkdir ~/tbs6981
  5. cd ~/tbs6981
  6. git clone git://linuxtv.org/media_build.git
  7. cd media_build
  8. ./build --main-git
  9. Cold reboot
  10. Test driver is failing
  11. cd ~/tbs6981/media_build/media
  12. git bisect start {git bisect start -- drivers/media}
  13. git bisect bad v3.17
  14. git bisect good v3.13
  15. make -C ../ distclean {is this path right?}
  16. make -C ../v4l
  17. make -C ../ install
  18. Cold reboot
  19. Test driver
  20. cd ~/tbs6981/media_build/media
  21. git bisect good/bad {depending on results}
  22. if bad commit not found go to step 15.
RayCic commented 9 years ago

Regression caused by commit 453afdd9ce33293f640e84dc17e5f366701516e8 "[media] cx23885: convert to vb2"

Because this commit introduces important change in driver's infrastructure, we have only two choices: 1) Wait until somebody smart fixes bug(s) in the driver (preferred solution) 2) Just revert this commit and all commits depending on it (and lose support for all new devices)

trsqr commented 9 years ago

Report this to Hans Verkuil (and cc linux-media), who did the vb2 conversion. I already once bugged him as I thought this introduced another regression, but in the end that was not the case (the problem was in my code, which he pointed out).

ljalves commented 9 years ago

I just realized that this issue is for the tsb6981 which I already commited to the official media tree some time ago (and already part of the official kernel). So you're telling me that the issue exists in the latest official kernel?

RayCic commented 9 years ago

It was reason why I closed it first time. Yes. After driver was converted by Hans Verkuil to videobuf2 these problems appeared. As suggested by @trsqr I contacted Hans Verkuil.

ljalves commented 9 years ago

Ok, closing this issue (again). I'll keep an eye on the linux-media mailing list for patches related to this.