mattmakai / fullstackpython.com

Full Stack Python source with Pelican, Bootstrap and Markdown.
https://www.fullstackpython.com/
MIT License
2.88k stars 627 forks source link

Add support for bad url locations #198

Closed huangsam closed 5 years ago

huangsam commented 5 years ago

New output:

Extract urls...
Currently checking: file=19-nginx.markdown                                          
Check urls...
Currently checking: id=3057 host=www.airport-parking-shop.co.uk              
Bad url status: {
    "http://www.codecademy.com/tracks/python": 404,
    "http://sircmpwn.github.io/2017/01/13/The-problem-with-Python-3.html": 404,
    "http://pythonforengineers.com/there-isnt-really-a-python-2-vs-python-3-problem/": 404,
    "http://ryanfrantz.com/posts/solving-monitoring/": 404,
    "http://hginit.com/": -1,
    "https://devup.co/a-look-at-devops-tools-landscape-7220099c6b81": -1,
    "https://blogs.msdn.microsoft.com/commandline/2018/06/20/windows-command-line-backgrounder/": 504,
    "https://github.com/rosarior/awesome-django": 404,
    "https://testdriven.io/part-one-flask-blueprints": 404,
    "https://javascript-minifier.com/": 504,
    "https://cssminifier.com/": 504,
    "https://cloud.google.com/appengine/docs/python/sms/twilio": 504,
    "http://www.django-rest-framework.org/topics/3.0-announcement/": 404,
    "http://www.diveintopython3.net/unit-testing.html": -1,
    "http://blog.fogcreek.com/working-effectively-with-unit-tests-interview-with-jay-fields/": 404,
    "https://bokeh.github.io/blog/2017/7/24/styling-bokeh/": 404,
    "https://bokeh.github.io/blog/2017/7/5/idiomatic_bokeh/": 404,
    "https://tomaugspurger.github.io/modern-1.html": 404,
    "http://pritishc.com/blog/2015/09/06/uploading-with-django-and-amazon-s3/": 404,
    "http://www.opsschool.org/en/latest/": 404,
    "https://devup.co/serverless-computing-if-there-is-no-server-where-does-my-application-run-a369c3699730": -1,
    "http://engineering.simondata.com/can-we-use-jenkins-for-that": 404,
    "https://linuxacademy.com/howtoguides/posts/show/topic/13750-managing-docker-containers-with-ansible": 404,
    "http://pritishc.com/blog/2015/09/03/docker-is-awesome/": 404,
    "http://pritishc.com/blog/2015/09/04/docker-is-awesome-part-ii/": 404,
    "http://stridercd.com/": -1
}

Bad url locations: {
    "http://www.codecademy.com/tracks/python": [
        "01-introduction/08-best-python-resources.markdown"
    ],
    "http://sircmpwn.github.io/2017/01/13/The-problem-with-Python-3.html": [
        "01-introduction/04-python-2-or-3.markdown"
    ],
    "http://pythonforengineers.com/there-isnt-really-a-python-2-vs-python-3-problem/": [
        "01-introduction/04-python-2-or-3.markdown"
    ],
    "http://ryanfrantz.com/posts/solving-monitoring/": [
        "06-devops/01-monitoring.markdown"
    ],
    "http://hginit.com/": [
        "02-development-environments/20-mercurial.markdown"
    ],
    "https://devup.co/a-look-at-devops-tools-landscape-7220099c6b81": [
        "06-devops/00-devops.markdown"
    ],
    "https://blogs.msdn.microsoft.com/commandline/2018/06/20/windows-command-line-backgrounder/": [
        "02-development-environments/10-powershell.markdown"
    ],
    "https://github.com/rosarior/awesome-django": [
        "04-web-development/02-django.markdown"
    ],
    "https://testdriven.io/part-one-flask-blueprints": [
        "04-web-development/03-flask.markdown"
    ],
    "https://javascript-minifier.com/": [
        "04-web-development/19-minification.markdown"
    ],
    "https://cssminifier.com/": [
        "04-web-development/19-minification.markdown"
    ],
    "https://cloud.google.com/appengine/docs/python/sms/twilio": [
        "04-web-development/52-twilio.markdown"
    ],
    "http://www.django-rest-framework.org/topics/3.0-announcement/": [
        "04-web-development/48-api-creation.markdown"
    ],
    "http://www.diveintopython3.net/unit-testing.html": [
        "04-web-development/36-unit-testing.markdown"
    ],
    "http://blog.fogcreek.com/working-effectively-with-unit-tests-interview-with-jay-fields/": [
        "04-web-development/36-unit-testing.markdown"
    ],
    "https://bokeh.github.io/blog/2017/7/24/styling-bokeh/": [
        "03-data/19-bokeh.markdown"
    ],
    "https://bokeh.github.io/blog/2017/7/5/idiomatic_bokeh/": [
        "03-data/19-bokeh.markdown"
    ],
    "https://tomaugspurger.github.io/modern-1.html": [
        "03-data/16-pandas.markdown"
    ],
    "http://pritishc.com/blog/2015/09/06/uploading-with-django-and-amazon-s3/": [
        "05-deployment/03-static-content.markdown"
    ],
    "http://www.opsschool.org/en/latest/": [
        "05-deployment/13-operating-systems.markdown"
    ],
    "https://devup.co/serverless-computing-if-there-is-no-server-where-does-my-application-run-a369c3699730": [
        "05-deployment/38-serverless.markdown"
    ],
    "http://engineering.simondata.com/can-we-use-jenkins-for-that": [
        "05-deployment/28-jenkins.markdown"
    ],
    "https://linuxacademy.com/howtoguides/posts/show/topic/13750-managing-docker-containers-with-ansible": [
        "05-deployment/33-ansible.markdown"
    ],
    "http://pritishc.com/blog/2015/09/03/docker-is-awesome/": [
        "05-deployment/36-docker.markdown"
    ],
    "http://pritishc.com/blog/2015/09/04/docker-is-awesome-part-ii/": [
        "05-deployment/36-docker.markdown"
    ],
    "http://stridercd.com/": [
        "05-deployment/27-continuous-integration.markdown"
    ]
}

It intends to provide correlation between bad urls and the content that contains those bad urls. Since a bad url can exist in multiple files, the locations output is shown in terms of a Python list instead of a Python string.

mattmakai commented 5 years ago

awesome thank you @huangsam! lemme get rid of these bad urls as well