Closed MrCaringi closed 1 month ago
Apparently the webpage blocks non-browser requests. Compare the output of:
$ curl -I -H 'user-agent: random' 'https://lat.motorsport.com/f1/news/hamilton-interes-motogp-equipo-liberty/10653282/?utm_source=RSS&utm_medium=referral&utm_campaign=RSS-MOTOGP&utm_term=News&utm_content=lat'
HTTP/2 403
...
and
$ curl -I -H 'user-agent: Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0' 'https://lat.motorsport.com/f1/news/hamilton-interes-motogp-equipo-liberty/10653282/?utm_source=RSS&utm_medium=referral&utm_campaign=RSS-MOTOGP&utm_term=News&utm_content=lat'
HTTP/2 200
...
So it should be possible to get around the block by setting full_text.client.user_agent
to a browser user-agent. The resulting config should look like this:
endpoints:
- path: /full-text.xml
note: Full text of any Source
filters:
- full_text:
client:
user_agent: "Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0"
- simplify_html: {}
thanks a lot!!!! let my try it
I updated my funnel.yml
config file, now it looks like this:
endpoints:
- path: /full-text.xml
note: Full text of any Source
filters:
- full_text: {}
- simplify_html: {}
- path: /agent.xml
note: Full with user-agent
filters:
- full_text:
client:
user_agent: "Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0"
- simplify_html: {}
When I applied this new config, I got this error:
rss-funnel-1 exited with code 1
rss-funnel-1 | 2024-09-13T01:47:13.616608Z INFO rss_funnel::server: loading config from "/funnel.yaml"
rss-funnel-1 | Error: Config(Yaml(Error("endpoints[1]: missing field `timeout`", line: 8, column: 5)))
So, I was looking for this parameter in your repository, and found this: https://github.com/shouya/rss-funnel/blob/29141c5c351f21031fb12c0b0704840076f6f3cd/src/client.rs#L53
So, I try this config (I added the timeout
parameter):
endpoints:
- path: /full-text.xml
note: Full text of any Source
filters:
- full_text: {}
- simplify_html: {}
- path: /agent.xml
note: Full with user-agent
timeout: "2m"
filters:
- full_text:
client:
user_agent: "Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0"
- simplify_html: {}
But this didnt make the trick, I have the same error:
[+] Running 2/2
✔ Network dc-rss-funnel_default Created 0.1s
✔ Container rss-funnel Started 0.3s
rss-funnel exited with code 1
rss-funnel | 2024-09-13T01:53:14.248547Z INFO rss_funnel::server: loading config from "/funnel.yaml"
rss-funnel | Error: Config(Yaml(Error("endpoints[1]: missing field `timeout`", line: 8, column: 5)))
Any idea what I am doing wrong?
Sorry it's a bug. The timeout
field is supposed be optional. I will fix it on next release.
For the time being you can manually specify the timeout as follows:
- path: /agent.xml
note: Full with user-agent
filters:
- full_text:
client:
user_agent: "Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0"
timeout: "20s"
- simplify_html: {}
Thanks! It works!!!!
Thanks a lot!
[+] Running 2/2
✔ Network dc-rss-funnel_default Created 0.1s
✔ Container rss-funnel Started 0.4s
rss-funnel | 2024-09-14T00:00:36.199292Z INFO rss_funnel::server: loading config from "/funnel.yaml"
rss-funnel | 2024-09-14T00:00:36.211102Z INFO rss_funnel::server::feed_service: loaded endpoint: /full-text.xml
rss-funnel | 2024-09-14T00:00:36.211517Z INFO rss_funnel::server::feed_service: loaded endpoint: /agent.xml
rss-funnel | 2024-09-14T00:00:36.211690Z INFO rss_funnel::server: listening on 0.0.0.0:4080
rss-funnel | 2024-09-14T00:00:36.215572Z INFO rss_funnel::server::image_proxy: handling image proxy: /_image
rss-funnel | 2024-09-14T00:00:36.216381Z INFO rss_funnel::server: starting server
before user-agent
(path /full-text.xml)
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>Motorsport.com - MotoGP - Historias</title>
<link>https://lat.motorsport.com/motogp/news/?utm_source=RSS&utm_medium=referral&utm_campaign=RSS-MOTOGP&utm_term=News&utm_content=lat</link>
<description></description>
<pubDate>Sat, 14 Sep 2024 00:02:27 +0000</pubDate>
<generator>Zend_Feed</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<ttl>100</ttl>
<atom:link href="http://lat.motorsport.com/rss/motogp/news/" rel="self" type="application/rss+xml">
</atom:link>
<item>
<title>Valentino Rossi, muy duro contra Márquez: "Nadie fue tan sucio como él"</title>
<link>https://lat.motorsport.com/motogp/news/valentino-rossi-marquez-nadie-tan-sucio/10653438/?utm_source=RSS&utm_medium=referral&utm_campaign=RSS-MOTOGP&utm_term=News&utm_content=lat</link>
<description><![CDATA[Andrea Mingo, miembro de la VR46 Riders Academy, se quedó sin moto en el Mundial y el equipo de Valentino Rossi lo ha mantenido como 'coach', además de animador y conductor del podcast 'Mig Babol', que el #46 está aprovechando para rememorar viejas luchas, ya que recientemente también explicó su enfrentamiento con Max Biaggi.Esta vez se centró en su desencuentro con Marc Márquez y lo ...<a href="https://lat.motorsport.com/motogp/news/valentino-rossi-marquez-nadie-tan-sucio/10653438/?utm_source=RSS&utm_medium=referral&utm_campaign=RSS-MOTOGP&utm_term=News&utm_content=lat">Sigue leyendo</a><br><br><p>
error fetching full text: HTTP status error 403 Forbidden (url: https://lat.motorsport.com/motogp/news/valentino-rossi-marquez-nadie-tan-sucio/10653438/?utm_source=RSS&utm_medium=referral&utm_campaign=RSS-MOTOGP&utm_term=News&utm_content=lat)</p>]]></description>
<category>MotoGP</category>
<enclosure url="https://cdn-5.motorsport.com/images/amp/2QzBkPNY/s6/valentino-rossi-yamaha-factory.jpg" length="246945" type="image/jpeg"/>
<guid isPermaLink="false">10653438</guid>
<pubDate>Thu, 12 Sep 2024 12:36:18 +0000</pubDate>
</item>
After, with user-agent
(path: /agent.xml)
?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>Motorsport.com - MotoGP - Historias</title>
<link>https://lat.motorsport.com/motogp/news/?utm_source=RSS&utm_medium=referral&utm_campaign=RSS-MOTOGP&utm_term=News&utm_content=lat</link>
<description></description>
<pubDate>Sat, 14 Sep 2024 00:02:27 +0000</pubDate>
<generator>Zend_Feed</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<ttl>100</ttl>
<atom:link href="http://lat.motorsport.com/rss/motogp/news/" rel="self" type="application/rss+xml">
</atom:link>
<item>
<title>Valentino Rossi, muy duro contra Márquez: "Nadie fue tan sucio como él"</title>
<link>https://lat.motorsport.com/motogp/news/valentino-rossi-marquez-nadie-tan-sucio/10653438/?utm_source=RSS&utm_medium=referral&utm_campaign=RSS-MOTOGP&utm_term=News&utm_content=lat</link>
<description><![CDATA[<p>Andrea Mingo, miembro de la VR46 Riders Academy, se quedó sin moto en el Mundial y el equipo de <a rel="noopener" href="https://lat.motorsport.com/driver/valentino-rossi/464484/" target="_blank">Valentino Rossi</a> lo ha mantenido como 'coach', además de animador y conductor del podcast 'Mig Babol', que el #46 está aprovechando para rememorar viejas luchas, ya que recientemente también explicó su enfrentamiento con Max Biaggi.</p><p>Esta vez se centró en su desencuentro con <a rel="noopener" href="https://lat.motorsport.com/driver/marc-marquez/463649/" target="_blank">Marc Márquez</a> y lo sucedido en 2015 y a partir de ahÃ. Paso a paso, Rossi fue reconstruyendo su versión de los hechos, vomitando todo su odio contra el corredor de Cervera, que en 2019 sumó su octava corona mundial y le acecha en el palmarés.</p><p>"Es lo más feo que me ha pasado a nivel deportivo absolutamente", recuerda lo sucedido en Argentina, a principios de aquel año. "La disputa con Márquez habÃa empezado en Argentina. Él habÃa elegido el neumático medio trasero, yo el duro. Se habÃa escapado, pero me recuperé y lo alcancé. Llegué a él e iba mucho más rápido, asà que para mà fue fácil adelantarlo. Le tomé el rebufo en la recta después de la curva 3 y frené bien para adelantarlo. Llegué, entré en la curva de derechas y hasta ahà siempre nos habÃamos llevado bien, pero se me echó encima a fondo", recuerda Rossi con detalle en el podcast.</p><p>"Lo pasé y él pensó que la única oportunidad que tenÃa era chocar conmigo. Intentó derribarme enseguida, vino deliberadamente a por mà para intentar tirarme. No querÃa perder. Volvà a mi lÃnea, desafortunadamente nos tocamos. Tú me la das, yo te la devuelvo. Entonces (Marc) se cayó. A partir de ahà nuestra relación se vino abajo. A pesar de ese episodio, siguió pretendiendo llevarse bien conmigo y besarme el culo", se mofa el italiano.</p><section data-custom="false" data-author="" data-title="" draggable="true" data-id="40770010" data-show-author="true" data-show-title="true" data-link="" data-widget="image" data-src="//cdn.motorsport.com/images/mgl/Y99z13AY/s8/valentino-rossi-yamaha-factor-1.jpg" data-height="534" contenteditable="false" data-width="800"><picture><source srcset="https://cdn.motorsport.com/images/mgl/Y99z13AY/s200/valentino-rossi-yamaha-factor-1.webp%20200w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s300/valentino-rossi-yamaha-factor-1.webp%20300w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s400/valentino-rossi-yamaha-factor-1.webp%20400w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s500/valentino-rossi-yamaha-factor-1.webp%20500w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s600/valentino-rossi-yamaha-factor-1.webp%20600w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s700/valentino-rossi-yamaha-factor-1.webp%20700w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s800/valentino-rossi-yamaha-factor-1.webp%20800w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s900/valentino-rossi-yamaha-factor-1.webp%20900w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s1000/valentino-rossi-yamaha-factor-1.webp%201000w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s1100/valentino-rossi-yamaha-factor-1.webp%201100w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s1200/valentino-rossi-yamaha-factor-1.webp%201200w" type="image/webp" sizes="(min-width: 650px) 700px"><source sizes="(min-width: 650px) 700px" srcset="https://cdn.motorsport.com/images/mgl/Y99z13AY/s200/valentino-rossi-yamaha-factor-1.jpg%20200w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s300/valentino-rossi-yamaha-factor-1.jpg%20300w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s400/valentino-rossi-yamaha-factor-1.jpg%20400w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s500/valentino-rossi-yamaha-factor-1.jpg%20500w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s600/valentino-rossi-yamaha-factor-1.jpg%20600w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s700/valentino-rossi-yamaha-factor-1.jpg%20700w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s800/valentino-rossi-yamaha-factor-1.jpg%20800w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s900/valentino-rossi-yamaha-factor-1.jpg%20900w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s1000/valentino-rossi-yamaha-factor-1.jpg%201000w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s1100/valentino-rossi-yamaha-factor-1.jpg%201100w,%20//cdn.motorsport.com/images/mgl/Y99z13AY/s1200/valentino-rossi-yamaha-factor-1.jpg%201200w" type="image/jpeg"><img height="800" loading="lazy" draggable="false" width="1200" alt="" src="https://cdn.motorsport.com/images/mgl/Y99z13AY/s1000/valentino-rossi-yamaha-factor-1.jpg"></picture> ...
@shouya Feel free to close this issue! thanks a lot!
Hi, for some feeds, I am getting this kind of errors in every item in the feed:
this is my current config file
funnel.yml
:feed:
https://lat.motorsport.com/rss/motogp/news/
this instance is installed in
Ubuntu 22.04.4 LTS aarch64
, withdocker compose
Screenshot:
Thanks in advance for your support!