pelican-plugins / deadlinks

Pelican plugin to validate availability of referenced external links
MIT License
16 stars 7 forks source link

[Bug] Warnings for good links #4

Open Kristinita opened 7 years ago

Kristinita commented 7 years ago

1. Summary

  1. deadlinks mark as dead good links.
  2. deadlinks mark as dead good links, but blocked from my IP (see Internet censorship in Russia).

2. Settings

My project — https://github.com/Kristinita/KristinitaPelican,

Part of my pelicanconf.py file:


PLUGIN_PATHS = ['pelican-plugins']
PLUGINS = [
    'pagefixer',
    'pelican_javascript',
    'section_number', 'interlinks', 'deadlinks'
]

DEADLINK_VALIDATION = True

DEADLINK_OPTS = {
    'archive': True,
    'classes': ['custom-class1', 'disabled'],
    'labels': True
}

3. Steps to reproduce

I run command in terminal:

pelican content --debug > DeadlinkDebug.txt 2>&1

See full output on Gist — https://gist.github.com/Kristinita/63c81829c196afd7dc68cbe5e3dba12a.

4. Expected behavior

Discover and replace real 403/404 links, not links from 1.1 and 1.2 items of my issue.

5. Actual behavior

List of links, mark as dead.

https://rsdn.ru/article/patterns/framework.xml#EKB
http://vaden-pro.ru/blog/laravel/laravel-chto-eto
http://web.archive.org/web/20150615162941/http://www.xpomo.com/ruskolan/tolpa/piramida.htm
http://www.spy-soft.net/chto-takoe-rat/
http://loveread.ec/read_book.php?id=45782&p=12
http://archive.is/20160611162905/http://liwihelp.ru/sistema/avtomaticheskoe_vklyuchenie_kompyutera.html
https://learn.javascript.ru/window-methods
http://javascript.ru/window-location
https://colocat.ru/texts/realip.html
http://dizems.ru/v-chem-otlichie-staticheskix-sajtov-ot-dinamicheskix
http://www.Is
http://optimakomp.ru/virustotal-totalnoe-skanirovanie-fajjlov-i-sajjtov-desyatkami-antivirusov/
http://wolandblog.com/3-pochemu-ya-ne-ispolzuyu-dnsbl-v-pomoshh-nachinayushhemu-postmasteru/
https://www.projecthoneypot.org/
https://urlquery.net/
http://www.dnsbl.info/dnsbl-database-check.php
http://wikireality.ru/wiki/MDK
http://archive.is/20160518165040/https://www.youtube.com/watch?v=qet1ypk3qDM&lc=z13owrebxvn2vt3e422du3wowrmzz5xxz04
http://archive.is/20160522103717/https://www.youtube.com/watch?v=8Lsrvn7oa60&lc=z12bz5axmkngxx10i22ucr15rtvnsjpyy04
http://archive.is/
http://web.archive.org/web/20150615162941/http://www.xpomo.com/ruskolan/tolpa/piramida.htm
http://archive.is/20160518125518/https://www.facebook.com/permalink.php?story_fbid=517539018447713&id=100005748574402%23
http://archive.is/20160601035438/http://www.sports.ru/profile/1021517009/comments/?p=30
http://archive.is/20160601041255/http://www.sports.ru/profile/70045047/comments/?p=38
http://web.archive.org/web/20150615162941/http://www.xpomo.com/ruskolan/tolpa/piramida.htm
http://alternativeto.net/software/resource-hacker/
https://www.google.ru/search?q=status+bar&newwindow=1&source=lnms&tbm=isch&sa=X&ved=0ahUKEwi-j9WygojTAhVGiSwKHfRhATYQ_AUIBigB&biw=1173&bih=729

I can successful visit this links without proxy and other anonymisation tools:

Some links working, but blocked by government of my country (Russia), example:

6. Environment

Operating system and version: Windows 10 Enterprise LTSB 64-bit EN Python: 3.6.1 Pelican: 3.7.1 BeautifulSoup4: 4.5.3

Thanks.

silentlamb commented 7 years ago

I've checked myself links from "actual behavior" section using most recent master and both raw and VPN connection (russian server) and here's the thing.

Some links to web.archive.org open up in web browser, but under the hood 403 status code is returned and the website says "cannot archive due to robots.txt on http://xxx.xxx.xxx". For these plugin seems to almost work properly. Almost, because I forgot to exclude links to web.archive.org from being checked (there's no reason to make web.archive.org link to web.archive.org) so that's a different bug.

In Firefox (checked using developer tools and network option) links to archive.is work properly (code 200 is returned) when using raw connection (Poland), but when switching to VPN connection (Russia) - timeouts occur.

For some websites connection cannot be made due to SSL errors:

Kristinita commented 7 years ago

@silentlamb,

1. Summary

In last Deadlinks version dead links doesn't replace to archive links, despite the fact that 'archive': True,.

2. Settings

Same Pelican configuration as first post.

Full output — https://gist.github.com/86cb35b6d9c445a81eadd1db2cf5b319, Warnings — https://gist.github.com/c2a96ee8da4027ac763b3c0ecb017af4.

3. Steps to reproduce

Same as first post.

4. Expected behavior

Replace dead links to archive links.

5. Actual behavior

Skipping… (not available), examples:

DEBUG: Starting new HTTPS connection (1): esquire.ru
WARNING: Skipping: https://esquire.ru/coined-word (not available)

DEBUG: Starting new HTTPS connection (1): colocat.ru
WARNING: Skipping: https://colocat.ru/texts/realip.html (not available)

6. Environment

Same as first post.

Thanks.

Kristinita commented 7 years ago

1. Question

Can you set Deadlinks, that your plugin replace links if return 403/404 status code, not other?

2. Argumentation

In this issue I showed, that Deadlinks can replace links, that good open for me. I think, it unexpected behavior.

Thanks.

Kristinita commented 6 years ago

@silentlamb , actually.

Thanks.