j0k3r / graby-site-config

Graby site config files
Other
18 stars 29 forks source link

Added LesJours.fr #25

Closed nicosomb closed 6 years ago

nicosomb commented 6 years ago

I can't fetch the author of the article. Please help me :-)

tcitworld commented 6 years ago

Author is inside <span itemprop="name">. Published date can be set to <meta property="article:published_time" content="2017-11-21T23:05:00.000Z"/>

nicosomb commented 6 years ago

I saw where the author is but I can't get it via xpath. The publication date is already ok.

aaa2000 commented 6 years ago

You can try with (//article//span[@itemprop="author"]/span[@itemprop="name"])[1], in firefox console, you get it by $x('(//article//span[@itemprop="author"]/span[@itemprop="name"])[1]')

nicosomb commented 6 years ago

It doesn't work with your proposal, @aaa2000.

aaa2000 commented 6 years ago

and with //article//span[@itemprop="author" and contains(@class, "link")]/span[@itemprop="name"]

aaa2000 commented 6 years ago

With a config, it works locally on master but not in https://f43.me/feed/test

author: //article//span[@itemprop="author" and contains(@class, "link")]/span[@itemprop="name"]
tidy:no
nicosomb commented 6 years ago

Not here on master branch and with this file:

title://h1[@class="h2"]
author: //article//span[@itemprop="author" and contains(@class, "link")]/span[@itemprop="name"]
body: //div[@class="article-holder"]

tidy:no

# Wallabag-specific login directives (not supported in FTR)
requires_login: yes

login_uri: https://lesjours.fr/session
login_username_field: mail
login_password_field: password

not_logged_in_xpath: //body[@class="not-logged-in"]

test_url: https://lesjours.fr/obsessions/pole-financier/ep12-marcel-campion/
j0k3r commented 6 years ago

Without tidy, I got:

<div class="col sm-w-6c md-w-8c lg-w-8c">
<address class="style-meta">

Texte

Camille Polloni 

Photo

Henri Collot/Sipa 

</address>
</div>

🤔

aaa2000 commented 6 years ago

I had to make a mistake then. I need to check this evening if I not commented 'clean' => true in php-readability/src/Readability.php

nicosomb commented 6 years ago

@j0k3r we can maybe merge this PR, and we'll improve the author part in an other one.