Closed Crocmagnon closed 1 year ago
i also see this, it's as if wallabag purposefully just deletes H1 headings, which is really confusing in most articles.
in fact, from what i can tell, the problem happens specifically when the first heading in a document is a <h2>
followed by a <h1>
.
update: nevermind that, i get the <h1>
problem regardless of the structure of the document
Yes, I don't believe this should be tagged as a "site config". This happens consistently everywhere for me. It's very confusing and it's really preventing me from using Wallabag.
It may fix the issue on this site, but it was merely used as an example. I believe we should find a more durable fix and maybe consider all subsequent h1 as h2 for example?
I am with you, @Crocmagnon but wallabag/graby is designed to get the most out of an article even without any config-file. In this case, even if prune: no
had been the standard, the h1 where missing. It was only the combination with my body-selector, which brings it back and I even don't know why.
But I agree, that <h1>
or <span>
should not removed automatically.
And it drives me crazy some times that some articles works without any configuration in @fivefilter's FulltextRSS (FTR) while @j0k3r's graby4wallabag/f43.me only gets an error message or fetches too much or too little from the source site - or vice versa.
Or that a site_config works with one product perfectly while for the other more tweaks or a complete rewrite is necessary.
It would be very helpful, if graby and FTR use the same engine and presets or even give us some more options for the configs:
# valid for both
prune: no
include: different.example.com.txt
replace_regex: / '(search.*) term' / 'new $1 term' /
[FTR]
include: custom/cookie.example.com.txt # for sending auth-cookies for paywall
body: //article
...
[graby]
body: //div[@id='main']
...
[wallabag]
include [graby] # this should be standard without writing.
include ![graby] # to not use the graby section for wallabag
# credentialstuff
Sth was started by @Kdecherf https://github.com/j0k3r/php-readability/pull/75
Environment
My
```yaml # This file is auto-generated during the composer install parameters: database_driver: pdo_sqlite database_host: 127.0.0.1 database_port: null database_name: symfony database_user: root database_password: null database_path: '%kernel.root_dir%/../data/db/wallabag.sqlite' database_table_prefix: wallabag_ database_socket: null database_charset: utf8 domain_name: 'https://wb.example.com' mailer_transport: smtp mailer_user: wallabag@example.com mailer_password: password mailer_host: smtp.example.com mailer_port: 587 mailer_encryption: null mailer_auth_mode: null locale: fr secret: secret twofactor_auth: true twofactor_sender: no-reply@wallabag.org fosuser_registration: false fosuser_confirmation: true fos_oauth_server_access_token_lifetime: 3600 fos_oauth_server_refresh_token_lifetime: 1209600 from_email: wallabag@example.com rss_limit: 50 rabbitmq_host: localhost rabbitmq_port: 5672 rabbitmq_user: guest rabbitmq_password: guest rabbitmq_prefetch_count: 10 redis_scheme: tcp redis_host: redis redis_port: 6379 redis_path: null redis_password: null sentry_dsn: null server_name: 'Your wallabag instance' ```app/config/parameters.yml
is:What steps will reproduce the bug?
<h1>
tags inside the article are not saved in wallabag's DB, like "1 server or 2 servers?", or "1-server approach".