umputun / feed-master

Pulls multiple podcast feeds (RSS) and republishes as a common feed, properly sorted and podcast-client friendly.
https://feed-master.umputun.dev
MIT License
116 stars 26 forks source link

Filter include/exclude #103

Closed Anian-igor closed 1 year ago

Anian-igor commented 1 year ago

Is this possible to get filter to include/exclude filter to RSS feed same like for YT feed?

umputun commented 1 year ago

Not sure what you mean exactly; however, the filters apply to feeds, not to the source (like YT). Pls, explain your use case (better with some config example) and give me more details about what exactly you are trying to filter out and what you meant by "include"

Anian-igor commented 1 year ago

I have really huge feed from local radio broadcaster https://nv.ua/rss/podcasts/viyna-v-ukraini.xml I want to filter just for few names in titles I tried to filter by title but without luck filter: title: Сич I have regexp but I don't know how to apply it. And didn't find example /Фурса|Сич|Михайлов|Шабунін|Горєвой|Портников|Яковина|Тимошенко|Денисенко/i

umputun commented 1 year ago

Well, we do have an example, see https://github.com/umputun/feed-master/blob/master/_example/etc/fm.yml#L86

Generally, the filter is to exclude things, not to include them. I.e. filter doesn't act as "gimme those items only", but rather "gimme all the items except some". However, you can try some reversed/inverted regex, like ^(?:(?!first|second).)*$. This particular one won't work because go's regex engine doesn't support the lookaround (e.g. the ?! negative lookahead operator). I'm not sure if there is an alternative way to achive exclusion in regex, not a big expert in regex magic.

Alternatively, the small change in the code (adding configuration parameter "inversed" or smth like this) and using it to flip result of the check in this function can be much easier way to achive the desired result. Feel free to subimt a PR for this or ping the author of the original implementation of regex filters

Anian-igor commented 1 year ago

Yes. I see these examples. But I thought I miss include filter. Thanks.

umputun commented 1 year ago

I have added the "invert" parameter to the filter. In your case, it will look like this:


    filter:
      title: (Фурса|Сич|Михайлов|Шабунін|Горєвой|Портников|Яковина|Тимошенко|Денисенко)
      invert: true

pls test it and let me know. You need to use :master docker image for those tests

Anian-igor commented 1 year ago

I updated master image. And Invert still don't work feed-master | 2022/12/23 14:30:19.849 [DEBUG] {proc/processor.go:52} refresh started feed-master | 2022/12/23 14:30:19.853 [DEBUG] {api/server.go:90} loading templates from webapp/templates/* feed-master | 2022/12/23 14:30:20.267 [INFO] {proc/processor.go:92} filtered 17703 (Fri, 23 Dec 2022 19:38:53 +0000), radio-nv Війна в Україні: Чому Путін сере в секретну валізу — Сергій Фурса, Віталій Сич feed-master | 2022/12/23 14:30:20.275 [INFO] {proc/store.go:54} save 1671824333-9da26a141ac244b5ddda211a7aa15a1ac3df94e8 - radio-nv - Війна в Україні: Чому Путін сере в секретну валізу — Сергій Фурса, Віталій Сич - 17703 feed-master | 2022/12/23 14:30:20.306 [INFO] {proc/store.go:54} save 1671794536-fa2b37434ccdbb6d86e21eb7db38a4f311ad2248 - radio-nv - Війна в Україні: Чи Баканов умисно не чіпав Московскьий патріархат? – Євстратій Зоря, ПЦУ - 17696

umputun commented 1 year ago

pls show the first line it prints to the log (with version info) and the filter part of your config

Anian-igor commented 1 year ago
  radio-nv:
    title: Радіо НВ - вибране
    description: НВ вибране
    link: https://podcasts.nv.ua
    language: "uk-ua"
    image: images/lavel_nv.png
    filter:
      title: (Фурса|Сич|Михайлов|Шабунін|Горєвой|Портников|Яковина|Тимошенко|Денисенко)
      invert: true
    sources:
      - name: Війна в Україні
        url: https://nv.ua/rss/podcasts/viyna-v-ukraini.xml

feed-master  | init container
feed-master  | set timezone America/Chicago (Fri Dec 23 14:41:38 CST 2022)
feed-master  | custom APP_UID not defined, using default uid=1001
feed-master  | chown: /srv/etc/fm.yml: Read-only file system
feed-master  | execute /srv/feed-master
feed-master  | feed-master master-04b9d0a-20221223T14:32:28
umputun commented 1 year ago

confused. your output actually indicates the entry as properly filtered.

2022/12/23 14:30:20.267 [INFO] {proc/processor.go:92} filtered 17703 (Fri, 23 Dec 2022 19:38:53 +0000), radio-nv Війна в Україні: Чому Путін сере в секретну валізу — Сергій Фурса, Віталій Сич feed-master

Those filtered suckers stored to the internal db with a special "junk" flag and this is why you see the next message "save ...". However, they not populated to the feed, or at least they not supposed to. Do you actually see them in the result feed?

Anian-igor commented 1 year ago

I see that I put not inverted filter log. Right now I put correct one


feed-master  | 2022/12/23 15:00:47.795 [DEBUG] {api/server.go:90} loading templates from webapp/templates/*
feed-master  | 2022/12/23 15:00:48.190 [INFO]  {proc/store.go:54} save 1671824333-9da26a141ac244b5ddda211a7aa15a1ac3df94e8 - radio-nv - Війна в Україні: Чому Путін сере в секретну валізу — Сергій Фурса, Віталій Сич - 17703
feed-master  | 2022/12/23 15:00:48.198 [INFO]  {proc/store.go:54} save 1671794536-fa2b37434ccdbb6d86e21eb7db38a4f311ad2248 - radio-nv - Війна в Україні: Чи Баканов умисно не чіпав Московскьий патріархат? – Євстратій Зоря, ПЦУ - 17696
feed-master  | 2022/12/23 15:00:48.202 [INFO]  {proc/store.go:54} save 1671794340-c60c9a4906d590e27567a831da0054f1e4d5a29e - radio-nv - Війна в Україні: США: Північна Корея передала зброю ПВК Вагнер. Яку саме? — Олексій Їжак - 17695
feed-master  | 2022/12/23 15:00:48.209 [INFO]  {proc/store.go:54} save 1671786292-b809f71eb5be0b98635a60978f63d03cc0f9d90a - radio-nv - Війна в Україні: Російські чмобіки – це армія заробітчан — Олексій Кошель - 17693
feed-master  | 2022/12/23 15:00:48.212 [INFO]  {proc/store.go:54} save 1671786094-3ad034fbcda34a022a3286b44f118c851aebbf8c - radio-nv - Війна в Україні: Спротив в Каховці. 150 рашистів знищено, колаборанта підірвано — Денис Попович - 17692
feed-master  | 2022/12/23 15:00:48.215 [INFO]  {proc/store.go:54} save 1671745727-539cf36247c5e144bf702e1204d5a9d60effbf27 - radio-nv - Війна в Україні: Є ідеї, щоб Україна будувала 50 заводів під землею — Вадим Черниш - 17691
umputun commented 1 year ago

reproduced the issue and it should be fixed by now. pls, pull the fresh master and give it another try

Anian-igor commented 1 year ago

Works as indeed. Thanks a lot