spartakos87 / greek_sites_crawler

Programm which can crawl plenty of greek sites
GNU General Public License v3.0
14 stars 4 forks source link

Some issues #5

Closed fkolokathi closed 7 years ago

fkolokathi commented 7 years ago

I ran your program for every wepsite and I want to tell you the following:

1) There is problem with parapolitika published date.You can try: http://www.parapolitika.gr/article/dio-chronia-apo-to-dimopsifisma-otan-kivernisi-echase-ti-dedilomeni

2)The program does not work for cnn website.You can try: http://www.cnn.gr/tech/story/87982/ta-kinita-tilefona-eythynontai-gia-tis-pseires-kata-ena-megalo-pososto

error: 'NoneType' object has no attribute 'text'

3)Your program does not work for skai.You can try for example: http://www.skai.gr/news/environment/article/349724/endiaferon-germanikon-etairion-gia-ependuseis-se-ape-stin-ellada/?utm_source=rss_news_environment&utm_campaign=skai200905190000&utm_medium=rss

error: 'NoneType' object has no attribute 'text'

4) Your program works for the left webpage only if I run the functions get_html and then left.

5) Your program works for most of the articles of protagon but it does not work for the following page for example: http://www.protagon.gr/kouzina/laxtaristi-almyri-tarta-me-praso-kai-tyria-44341443227

6)Finally your program does not work when I give a link like the following which does not have text.I think that you can optimize it by using an if-condition in order not to open these files.

http://www.huffingtonpost.gr/news/life/

error: 'NoneType' object has no attribute 'text'

fkolokathi commented 7 years ago

Moreover, when you give a link with greek words it does not work. Try: http://www.dikaiologitika.gr/community/45264-ακυρωση-αδειασ-εποχικου-λογω-βεβαιωσησ-ικα

spartakos87 commented 7 years ago

Put the link of skai between '' before parse it