Open frankhubrepo opened 3 years ago
Recently, I've tried to make site-configs for wallabag server and I noticed some XPATH problem like this issue. You should check log/html.log
. Graby uses the php-readability to process HTML, and it strips and flats many tags for readability
. This mean XPATHs of a site-config won't be the same like XPATHs of browsers and you can't use them in the site-config directly.
In my case, I wanted to extract a "real" author and a "real" title from an article in some website. But I got nothing after processing. Even though, I used XPATHs which work correctly in Chrome and Firefox browser. I can't use https://siteconfig.fivefilters.org/
because it doesn't show CSS and XPATH bar in bottom when I tested that websites.
Put the debug settings in your some-graby-test.php
file and run it.
$graby = new Graby([
'debug' => true,
'log_leve' => 'debug',
]);
Then, you can see the log/html.log
file.
The problem is that Graby is retrieving that HTML: response.html.txt Which is definitely not the one you are querying from your browser console.
Maybe we need to add some cookie for the request. I've tried some without success.
I am trying to fetch the content from this article: https://www.businesstimes.com.sg/government-economy/malaysia-likely-to-impose-fresh-mco-in-kl-other-areas-report
However as it doesn't work, i tried adding a config file as shown here: https://doc.wallabag.org/en/user/errors_during_fetching.html
This is the code within the config file:
The issue is even then I don't get the content, and I know the query is right because i can see it in the browser console:
Also here is the log:
Any insight on what could be happening here or something I'm missing?