osome-iu / hoaxy-backend

Backend component for Hoaxy, a tool to visualize the spread of claims and fact checking
http://hoaxy.iuni.iu.edu/
GNU General Public License v3.0
138 stars 44 forks source link

New Error when running hoaxy init #66

Closed awallemo closed 3 years ago

awallemo commented 3 years ago

2021-01-26 13:16:54,656 - hoaxy(init) - INFO: Site file /home/ubuntu/.hoaxy/sites.yaml found Traceback (most recent call last): File "/home/ubuntu/anaconda2/envs/hoaxy/bin/hoaxy", line 11, in load_entry_point('hoaxy==0.1.0', 'console_scripts', 'hoaxy')() File "/home/ubuntu/anaconda2/envs/hoaxy/lib/python3.7/site-packages/hoaxy-0.1.0-py3.7.egg/hoaxy/commands/cmdline.py", line 122, in main File "/home/ubuntu/anaconda2/envs/hoaxy/lib/python3.7/site-packages/hoaxy-0.1.0-py3.7.egg/hoaxy/commands/init.py", line 138, in run File "/home/ubuntu/anaconda2/envs/hoaxy/lib/python3.7/site-packages/hoaxy-0.1.0-py3.7.egg/hoaxy/commands/init.py", line 115, in init File "/home/ubuntu/anaconda2/envs/hoaxy/lib/python3.7/site-packages/hoaxy-0.1.0-py3.7.egg/hoaxy/commands/site.py", line 368, in load_sites

TypeError: list indices must be integers or slices, not str

Whenever I run the hoaxy init command I get this error. Any recommendations on what to do?

glciampaglia commented 3 years ago

Hi @awallemo, can you please share you sites.yaml?

awallemo commented 3 years ago

Do I just copy the content of the file?

glciampaglia commented 3 years ago

Yes, please. Or you can put it on pastebin or similar site.

awallemo commented 3 years ago
  /required, name of site
- name: factcheck.org
  / required, primary domain of factcheck.org
  domain: factcheck.org
  / required, type of this site, it is a fact checking site
  site_type: fact_checking
 / base URL, default by infer, the home of factcheck.org
  base_url: http://www.factcheck.org/
  / site tags, default [], more about this site
  site_tags: []
  / alternate domains, default [], secondary domains
  / that redirect to the primary domain
  alternate_domains: []
  / is factcheck.org is alive, default true
  is_alive: true
  / is factcheck.org is enabled, default true
  / when false, this site will not be tract and lucene search may ignore it
  is_enabled: true
  / rules of how to crawl factcheck.org
  article_rules:
    / regular expression of url we like to collect
    / right now this field does not used, please ignore
    url_regex: ^http://www\.factcheck.org/20[0-2]\d/((0[1-9])|(1[0-2]))/[^/\s]+/?$
    / how to fetch the new articles update from factcheck.org
    / by default, page.spider, which crawls the home page of this site
    update:
      / here we use feed.spider as we have RSS feed URL,
      / see hoaxy.crawl.spiders
    - spider_name: feed.spider
      / the necessary parameters for building spider instance
      spider_kwargs:
        / here, we need the RSS feed URLs
        urls:
        - http://www.factcheck.org/feed/
        / and also who providse the RSS feed
        / normally the website itself, sometimes a third party, e.g. feedburner
        provider: self
    / how to fetch archive of factcheck.org
    / by default, site.page, which crawls the whole site.
    archive:
      / here we use page_template.spider, as factcheck.org use a page
      / template to list all of its posted articles
    - spider_name: page_template.spider
      spider_kwargs:
        / a list of xpaths to extract links (to find @href)
        / by default, a python tuple('/html/body',) is used
        / to fetch all links in this page
        / here, we use specified xpath expression
        / Note: please do not include /a/@href part
        href_xpaths:
        - //article//header/h2
        / page templates of factcheck.org
        / increasing by page number
        page_templates:
        - http://www.factcheck.org/page/{p_num}
     / factcheck.org also provides sitemap.xml to help us collect all
    / links in this site
    - spider_name: sitemap.spider
      spider_kwargs:
        / these URLs could be actual sitemap URL
        / OR they could be the entry of a list of sitemap.xml files
        / the spider will follow all XML links and
        / assuming these XML file are sitemaps and extract non-xml
        / links
        urls:
        - http://www.factcheck.org/sitemap.xml
/Another site, thedcgazette.com
- name: thedcgazette.com
  domain: thedcgazette.com
  site_type: claim
  base_url: http://thedcgazette.com/
  site_tags:
  - source: fakenewswatch.com
    name: hoax
  alternate_domains:
  - is_alive: true
    name: dcgazette.com
  is_alive: true
  is_enabled: true
  article_rules:
    url_regex:
    update:
    - spider_name: feed.spider
      spider_kwargs:
        urls:
        - http://dcgazette.com/feed/
        provider: self
    archive:
    - spider_name: page_template.spider
      spider_kwargs:
        href_xpaths:
        - //div[@id="main-content"]/article/header/h3
        page_templates:
        - http://dcgazette.com/page/{p_num}/
glciampaglia commented 3 years ago

The syntax of this file is not correct. In YAML files, comment lines should start with a the hashtag character (#), not with the front slash character (/). For example:

  /required, name of site
- name: factcheck.org
  / required, primary domain of factcheck.org

Should be:

  # required, name of site
- name: factcheck.org
  # required, primary domain of factcheck.org
glciampaglia commented 3 years ago

You can see an example here: https://github.com/IUNetSci/hoaxy-backend/blob/master/hoaxy/data/samples/sites.sample.yaml

awallemo commented 3 years ago

Oh, yes, I manually changed this here so that it would be formatted in the correct size.

glciampaglia commented 3 years ago

Could you please put it on pastebin.com and share the link here? https://pastebin.com/

awallemo commented 3 years ago

https://pastebin.com/EETk00Sy

glciampaglia commented 3 years ago

You have two -archive blocks, which I don't think is allowed.

Can you please try to run hoaxy init using this file instead? https://github.com/IUNetSci/hoaxy-backend/blob/master/hoaxy/data/samples/sites.sample.yaml

You will need to rename it to sites.yaml.

awallemo commented 3 years ago

I must have copy pasted wrong from my sites.yaml file. Here is the correct one https://pastebin.com/Y5AQVe9N

My sites.yaml file does not have two archive blocks

glciampaglia commented 3 years ago

Thank you. I don't see anything odd in your file (it seems you are using the same file as the sample one).

How did you get hoaxy? Did you build it yourself or did use one of the images (dockers or Amazon AMI)?

awallemo commented 3 years ago

Its deployed on an Amazon AWS EC2 instance.

glciampaglia commented 3 years ago

Got it. I don't have access to EC2 right now so I cannot check, but perhaps @chathuriw can try to reproduce the problem on the AMI. I've marked the issue as a potential bug for her reference.

awallemo commented 3 years ago

Okay, I'll wait for her then. Thank you!

chathuriw commented 3 years ago

I create a new instance with hoaxy-ami and enable the conda environment and ran hoaxy init --ignore-redirected --ignore-inactive. It ran successfully.

(hoaxy) ubuntu@ip-172-31-39-191:~/hoaxy-backend$ hoaxy init --ignore-redirected --ignore-inactive 2021-01-29 19:40:09,323 - hoaxy(init) - INFO: Creating database tables: 2021-01-29 19:40:09,323 - hoaxy(init) - WARNING: Ignore existed tables 2021-01-29 19:40:09,344 - hoaxy(init) - INFO: Inserting platforms if not exist 2021-01-29 19:40:09,368 - hoaxy(init) - INFO: Trying to load site data: 2021-01-29 19:40:09,368 - hoaxy(init) - INFO: Claim domains /home/ubuntu/.hoaxy/domains_claim.txt found 2021-01-29 19:40:09,369 - hoaxy(init) - INFO: Sending HTTP requests to infer base URLs ... 2021-01-29 19:40:09,370 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): infowars.com:80 2021-01-29 19:40:09,404 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://infowars.com:80 "HEAD / HTTP/1.1" 301 0 2021-01-29 19:40:09,406 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): infowars.com:443 2021-01-29 19:40:09,482 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://infowars.com:443 "HEAD / HTTP/1.1" 301 0 2021-01-29 19:40:09,484 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): www.infowars.com:443 2021-01-29 19:40:09,594 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://www.infowars.com:443 "HEAD / HTTP/1.1" 200 0 2021-01-29 19:40:09,598 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): empirenews.net:80 2021-01-29 19:40:09,667 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://empirenews.net:80 "HEAD / HTTP/1.1" 301 0 2021-01-29 19:40:09,668 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): empirenews.net:443 2021-01-29 19:40:09,858 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://empirenews.net:443 "HEAD / HTTP/1.1" 200 0 2021-01-29 19:40:09,863 - hoaxy(init) - DEBUG: Insert or update site infowars.com 2021-01-29 19:40:09,864 - hoaxy(init) - DEBUG: Insert or update site empirenews.net 2021-01-29 19:40:09,864 - hoaxy(init) - INFO: Fact checking domains /home/ubuntu/.hoaxy/domains_factchecking.txt found 2021-01-29 19:40:09,864 - hoaxy(init) - INFO: Sending HTTP requests to infer base URLs ... 2021-01-29 19:40:09,865 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): snopes.com:80 2021-01-29 19:40:09,928 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://snopes.com:80 "HEAD / HTTP/1.1" 301 0 2021-01-29 19:40:09,929 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): snopes.com:443 2021-01-29 19:40:10,104 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://snopes.com:443 "HEAD / HTTP/1.1" 301 0 2021-01-29 19:40:10,106 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): www.snopes.com:443 2021-01-29 19:40:10,178 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://www.snopes.com:443 "HEAD / HTTP/1.1" 200 0 2021-01-29 19:40:10,182 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): factcheck.org:80 2021-01-29 19:40:10,225 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://factcheck.org:80 "HEAD / HTTP/1.1" 301 0 2021-01-29 19:40:10,226 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): factcheck.org:443 2021-01-29 19:40:10,497 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://factcheck.org:443 "HEAD / HTTP/1.1" 301 0 2021-01-29 19:40:10,499 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): www.factcheck.org:443 2021-01-29 19:40:10,558 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://www.factcheck.org:443 "HEAD / HTTP/1.1" 200 0 2021-01-29 19:40:10,563 - hoaxy(init) - DEBUG: Insert or update site snopes.com 2021-01-29 19:40:10,566 - hoaxy(init) - DEBUG: Insert or update site factcheck.org 2021-01-29 19:40:10,566 - hoaxy(init) - INFO: Site file /home/ubuntu/.hoaxy/sites.yaml found 2021-01-29 19:40:10,936 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): duffelblog.com:80 2021-01-29 19:40:10,944 - hoaxy(init) - ERROR: HTTPConnectionPool(host='duffelblog.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4bbbde0978>: Failed to establish a new connection: [Errno -5] No address associated with hostname')) 2021-01-29 19:40:10,945 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): duffelblog.com:443 2021-01-29 19:40:10,945 - hoaxy(init) - ERROR: HTTPSConnectionPool(host='duffelblog.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f4bbbde05f8>: Failed to establish a new connection: [Errno -5] No address associated with hostname')) 2021-01-29 19:40:10,946 - hoaxy(init) - WARNING: Domain duffelblog.com is inactive! 2021-01-29 19:40:10,946 - hoaxy(init) - WARNING: Site bigamericannews.com is inactive now! 2021-01-29 19:40:10,946 - hoaxy(init) - WARNING: Site christwire.org is inactive now! 2021-01-29 19:40:10,946 - hoaxy(init) - WARNING: Site drudgereport.com.co is inactive now! 2021-01-29 19:40:10,946 - hoaxy(init) - WARNING: Site empirenews.com is inactive now! 2021-01-29 19:40:10,946 - hoaxy(init) - WARNING: Site msnbc.co is inactive now! 2021-01-29 19:40:10,946 - hoaxy(init) - WARNING: Site msnbc.website is inactive now! 2021-01-29 19:40:10,946 - hoaxy(init) - WARNING: Site sprotspickle.com is inactive now! 2021-01-29 19:40:10,946 - hoaxy(init) - WARNING: Site duffleblog.com is inactive now! 2021-01-29 19:40:10,946 - hoaxy(init) - WARNING: Site nahadaily.com is inactive now! 2021-01-29 19:40:10,946 - hoaxy(init) - WARNING: Site libertymovementradio.com is inactive now! 2021-01-29 19:40:10,947 - hoaxy(init) - WARNING: Site wakingupwisconsin.com is inactive now! 2021-01-29 19:40:10,958 - hoaxy(init) - DEBUG: Insert or update site factcheck.org 2021-01-29 19:40:10,974 - hoaxy(init) - DEBUG: Insert or update site politifact.com 2021-01-29 19:40:10,979 - hoaxy(init) - DEBUG: Insert or update site opensecrets.org 2021-01-29 19:40:10,984 - hoaxy(init) - DEBUG: Insert or update site snopes.com 2021-01-29 19:40:10,990 - hoaxy(init) - DEBUG: Insert or update site truthorfiction.com 2021-01-29 19:40:10,996 - hoaxy(init) - DEBUG: Insert or update site hoax-slayer.com 2021-01-29 19:40:11,001 - hoaxy(init) - DEBUG: Insert or update site hoax-slayer.net 2021-01-29 19:40:11,006 - hoaxy(init) - DEBUG: Insert or update site badsatiretoday.com 2021-01-29 19:40:11,012 - hoaxy(init) - DEBUG: Insert or update site climatefeedback.org 2021-01-29 19:40:11,126 - hoaxy(init) - DEBUG: Insert or update site americannews.com 2021-01-29 19:40:11,133 - hoaxy(init) - DEBUG: Insert or update site civictribune.com 2021-01-29 19:40:11,142 - hoaxy(init) - DEBUG: Insert or update site clickhole.com 2021-01-29 19:40:11,263 - hoaxy(init) - DEBUG: Insert or update site thedcgazette.com 2021-01-29 19:40:11,271 - hoaxy(init) - DEBUG: Insert or update site dailycurrant.com 2021-01-29 19:40:11,280 - hoaxy(init) - DEBUG: Insert or update site dcclothesline.com 2021-01-29 19:40:11,286 - hoaxy(init) - DEBUG: Insert or update site derfmagazine.com 2021-01-29 19:40:11,299 - hoaxy(init) - DEBUG: Insert or update site duhprogressive.com 2021-01-29 19:40:11,306 - hoaxy(init) - DEBUG: Insert or update site enduringvision.com 2021-01-29 19:40:11,313 - hoaxy(init) - DEBUG: Insert or update site nationalreport.net 2021-01-29 19:40:11,321 - hoaxy(init) - DEBUG: Insert or update site newsbiscuit.com 2021-01-29 19:40:11,329 - hoaxy(init) - DEBUG: Insert or update site newsmutiny.com 2021-01-29 19:40:11,337 - hoaxy(init) - DEBUG: Insert or update site politicalears.com 2021-01-29 19:40:11,344 - hoaxy(init) - DEBUG: Insert or update site private-eye.co.uk 2021-01-29 19:40:11,351 - hoaxy(init) - DEBUG: Insert or update site realnewsrightnow.com 2021-01-29 19:40:11,357 - hoaxy(init) - DEBUG: Insert or update site rilenews.com 2021-01-29 19:40:11,364 - hoaxy(init) - DEBUG: Insert or update site thenewsnerd.com 2021-01-29 19:40:11,371 - hoaxy(init) - DEBUG: Insert or update site theuspatriot.com 2021-01-29 19:40:11,378 - hoaxy(init) - DEBUG: Insert or update site witscience.org 2021-01-29 19:40:11,388 - hoaxy(init) - DEBUG: Insert or update site amplifyingglass.com 2021-01-29 19:40:11,394 - hoaxy(init) - DEBUG: Insert or update site empiresports.co 2021-01-29 19:40:11,401 - hoaxy(init) - DEBUG: Insert or update site gomerblog.com 2021-01-29 19:40:11,410 - hoaxy(init) - DEBUG: Insert or update site huzlers.com 2021-01-29 19:40:11,416 - hoaxy(init) - DEBUG: Insert or update site itaglive.com 2021-01-29 19:40:11,431 - hoaxy(init) - DEBUG: Insert or update site politicops.com 2021-01-29 19:40:11,438 - hoaxy(init) - DEBUG: Insert or update site rockcitytimes.com 2021-01-29 19:40:11,446 - hoaxy(init) - DEBUG: Insert or update site thelapine.ca 2021-01-29 19:40:11,453 - hoaxy(init) - DEBUG: Insert or update site thespoof.com 2021-01-29 19:40:11,460 - hoaxy(init) - DEBUG: Insert or update site weeklyworldnews.com 2021-01-29 19:40:11,467 - hoaxy(init) - DEBUG: Insert or update site worldnewsdailyreport.com 2021-01-29 19:40:11,477 - hoaxy(init) - DEBUG: Insert or update site 21stcenturywire.com 2021-01-29 19:40:11,484 - hoaxy(init) - DEBUG: Insert or update site activistpost.com 2021-01-29 19:40:11,491 - hoaxy(init) - DEBUG: Insert or update site beforeitsnews.com 2021-01-29 19:40:11,498 - hoaxy(init) - DEBUG: Insert or update site chronicle.su 2021-01-29 19:40:11,507 - hoaxy(init) - DEBUG: Insert or update site coasttocoastam.com 2021-01-29 19:40:11,514 - hoaxy(init) - DEBUG: Insert or update site consciouslifenews.com 2021-01-29 19:40:11,521 - hoaxy(init) - DEBUG: Insert or update site countdowntozerotime.com 2021-01-29 19:40:11,527 - hoaxy(init) - DEBUG: Insert or update site counterpsyops.com 2021-01-29 19:40:11,535 - hoaxy(init) - DEBUG: Insert or update site dailybuzzlive.com 2021-01-29 19:40:11,542 - hoaxy(init) - DEBUG: Insert or update site disclose.tv 2021-01-29 19:40:11,548 - hoaxy(init) - DEBUG: Insert or update site fprnradio.com 2021-01-29 19:40:11,557 - hoaxy(init) - DEBUG: Insert or update site geoengineeringwatch.org 2021-01-29 19:40:11,565 - hoaxy(init) - DEBUG: Insert or update site globalresearch.ca 2021-01-29 19:40:11,576 - hoaxy(init) - DEBUG: Insert or update site govtslaves.info 2021-01-29 19:40:11,583 - hoaxy(init) - DEBUG: Insert or update site gulagbound.com 2021-01-29 19:40:11,589 - hoaxy(init) - DEBUG: Insert or update site hangthebankers.com 2021-01-29 19:40:11,596 - hoaxy(init) - DEBUG: Insert or update site humansarefree.com 2021-01-29 19:40:11,605 - hoaxy(init) - DEBUG: Insert or update site infowars.com 2021-01-29 19:40:11,612 - hoaxy(init) - DEBUG: Insert or update site intellihub.com 2021-01-29 19:40:11,622 - hoaxy(init) - DEBUG: Insert or update site lewrockwell.com 2021-01-29 19:40:11,629 - hoaxy(init) - DEBUG: Insert or update site libertytalk.fm 2021-01-29 19:40:11,637 - hoaxy(init) - DEBUG: Insert or update site naturalnews.com 2021-01-29 19:40:11,644 - hoaxy(init) - DEBUG: Insert or update site newswire-24.com 2021-01-29 19:40:11,651 - hoaxy(init) - DEBUG: Insert or update site nodisinfo.com 2021-01-29 19:40:11,658 - hoaxy(init) - DEBUG: Insert or update site nowtheendbegins.com 2021-01-29 19:40:11,665 - hoaxy(init) - DEBUG: Insert or update site pakalertpress.com 2021-01-29 19:40:11,674 - hoaxy(init) - DEBUG: Insert or update site politicalblindspot.com 2021-01-29 19:40:11,683 - hoaxy(init) - DEBUG: Insert or update site prisonplanet.com 2021-01-29 19:40:11,695 - hoaxy(init) - DEBUG: Insert or update site realfarmacy.com 2021-01-29 19:40:11,702 - hoaxy(init) - DEBUG: Insert or update site redflagnews.com 2021-01-29 19:40:11,712 - hoaxy(init) - DEBUG: Insert or update site thedailysheeple.com 2021-01-29 19:40:11,730 - hoaxy(init) - DEBUG: Insert or update site therundownlive.com 2021-01-29 19:40:11,738 - hoaxy(init) - DEBUG: Insert or update site unconfirmedsources.com 2021-01-29 19:40:11,746 - hoaxy(init) - DEBUG: Insert or update site veteranstoday.com 2021-01-29 19:40:11,756 - hoaxy(init) - DEBUG: Insert or update site worldtruth.tv 2021-01-29 19:40:11,760 - hoaxy(init) - DEBUG: Insert or update site thevalleyreport.com 2021-01-29 19:40:11,766 - hoaxy(init) - DEBUG: Insert or update site departed.co 2021-01-29 19:40:11,771 - hoaxy(init) - DEBUG: Insert or update site myfreshnews.com 2021-01-29 19:40:11,775 - hoaxy(init) - INFO: Added or updated sites are: [('infowars.com', 'claim', 'http://www.infowars.com/'), ('snopes.com', 'fact_checking', 'http://www.snopes.com/'), ('factcheck.org', 'fact_checking', 'http://www.factcheck.org/'), ('politifact.com', 'fact_checking', 'http://www.politifact.com/'), ('opensecrets.org', 'fact_checking', 'http://www.opensecrets.org/'), ('truthorfiction.com', 'fact_checking', 'https://www.truthorfiction.com/'), ('hoax-slayer.com', 'fact_checking', 'http://www.hoax-slayer.com/'), ('hoax-slayer.net', 'fact_checking', 'http://www.hoax-slayer.net/'), ('badsatiretoday.com', 'fact_checking', 'http://badsatiretoday.com/'), ('climatefeedback.org', 'fact_checking', 'http://climatefeedback.org/'), ('americannews.com', 'claim', 'http://americannews.com/'), ('civictribune.com', 'claim', 'http://civictribune.com/'), ('clickhole.com', 'claim', 'http://www.clickhole.com/'), ('thedcgazette.com', 'claim', 'http://thedcgazette.com/'), ('dailycurrant.com', 'claim', 'http://dailycurrant.com/'), ('dcclothesline.com', 'claim', 'http://www.dcclothesline.com/'), ('derfmagazine.com', 'claim', 'http://www.derfmagazine.com/'), ('duhprogressive.com', 'claim', 'http://duhprogressive.com/'), ('enduringvision.com', 'claim', 'http://www.enduringvision.com/'), ('nationalreport.net', 'claim', 'http://nationalreport.net/'), ('newsbiscuit.com', 'claim', 'http://www.newsbiscuit.com/'), ('newsmutiny.com', 'claim', 'http://newsmutiny.com/'), ('politicalears.com', 'claim', 'http://politicalears.com/'), ('private-eye.co.uk', 'claim', 'http://private-eye.co.uk/'), ('realnewsrightnow.com', 'claim', 'http://realnewsrightnow.com/'), ('rilenews.com', 'claim', 'http://www.rilenews.com/'), ('thenewsnerd.com', 'claim', 'http://www.thenewsnerd.com/'), ('theuspatriot.com', 'claim', 'http://theuspatriot.com/'), ('witscience.org', 'claim', 'http://witscience.org/'), ('amplifyingglass.com', 'claim', 'http://www.amplifyingglass.com/'), ('empiresports.co', 'claim', 'http://www.empiresports.co/'), ('gomerblog.com', 'claim', 'http://gomerblog.com/'), ('huzlers.com', 'claim', 'http://huzlers.com/'), ('itaglive.com', 'claim', 'http://itaglive.com/'), ('politicops.com', 'claim', 'http://politicops.com/'), ('rockcitytimes.com', 'claim', 'http://www.rockcitytimes.com/'), ('thelapine.ca', 'claim', 'https://thelapine.ca/'), ('thespoof.com', 'claim', 'http://www.thespoof.com/'), ('weeklyworldnews.com', 'claim', 'http://weeklyworldnews.com/'), ('worldnewsdailyreport.com', 'claim', 'http://worldnewsdailyreport.com/'), ('21stcenturywire.com', 'claim', 'http://21stcenturywire.com/'), ('activistpost.com', 'claim', 'http://www.activistpost.com/'), ('beforeitsnews.com', 'claim', 'http://beforeitsnews.com/'), ('chronicle.su', 'claim', 'http://chronicle.su/'), ('coasttocoastam.com', 'claim', 'http://www.coasttocoastam.com/'), ('consciouslifenews.com', 'claim', 'http://consciouslifenews.com/'), ('countdowntozerotime.com', 'claim', 'http://countdowntozerotime.com/'), ('counterpsyops.com', 'claim', 'https://counterpsyops.com/'), ('dailybuzzlive.com', 'claim', 'http://dailybuzzlive.com/'), ('disclose.tv', 'claim', 'http://www.disclose.tv/'), ('fprnradio.com', 'claim', 'http://fprnradio.com/'), ('geoengineeringwatch.org', 'claim', 'http://www.geoengineeringwatch.org/'), ('globalresearch.ca', 'claim', 'http://www.globalresearch.ca/'), ('govtslaves.info', 'claim', 'http://www.govtslaves.info/'), ('gulagbound.com', 'claim', 'http://gulagbound.com/'), ('hangthebankers.com', 'claim', 'http://www.hangthebankers.com/'), ('humansarefree.com', 'claim', 'http://humansarefree.com/'), ('intellihub.com', 'claim', 'https://www.intellihub.com/'), ('lewrockwell.com', 'claim', 'https://www.lewrockwell.com/'), ('libertytalk.fm', 'claim', 'http://libertytalk.fm/'), ('naturalnews.com', 'claim', 'http://www.naturalnews.com'), ('newswire-24.com', 'claim', 'https://newswire-24.com/'), ('nodisinfo.com', 'claim', 'http://nodisinfo.com/'), ('nowtheendbegins.com', 'claim', 'http://www.nowtheendbegins.com/'), ('pakalertpress.com', 'claim', 'http://www.pakalertpress.com/'), ('politicalblindspot.com', 'claim', 'http://politicalblindspot.com/ '), ('prisonplanet.com', 'claim', 'http://www.prisonplanet.com/'), ('realfarmacy.com', 'claim', 'http://www.realfarmacy.com/'), ('redflagnews.com', 'claim', 'http://www.redflagnews.com/'), ('thedailysheeple.com', 'claim', 'http://www.thedailysheeple.com/'), ('therundownlive.com', 'claim', 'http://therundownlive.com/'), ('unconfirmedsources.com', 'claim', 'http://unconfirmedsources.com/'), ('veteranstoday.com', 'claim', 'http://www.veteranstoday.com/'), ('worldtruth.tv', 'claim', 'http://worldtruth.tv/'), ('thevalleyreport.com', 'claim', 'https://thevalleyreport.com/'), ('departed.co', 'claim', 'http://departed.co/'), ('myfreshnews.com', 'claim', 'http://myfreshnews.com/')] 2021-01-29 19:40:11,775 - hoaxy(init) - INFO: Done.

Double check your site.yaml.

awallemo commented 3 years ago

I ran the command you gave but I still get the same error. My sites.yaml file is the one I posted in the pastebin I commented earlier on. I dont see anything wrong there and I don't really understand what i should do in the sites.yaml file when I look at the error.

awallemo commented 3 years ago

2021-01-29 21:59:09,622 - hoaxy(init) - INFO: Site file /home/ubuntu/.hoaxy/sites.yaml found Traceback (most recent call last): File "/home/ubuntu/anaconda2/envs/hoaxy/bin/hoaxy", line 11, in load_entry_point('hoaxy==0.1.0', 'console_scripts', 'hoaxy')() File "/home/ubuntu/anaconda2/envs/hoaxy/lib/python3.7/site-packages/hoaxy-0.1.0-py3.7.egg/hoaxy/commands/cmdline.py", line 122, in main File "/home/ubuntu/anaconda2/envs/hoaxy/lib/python3.7/site-packages/hoaxy-0.1.0-py3.7.egg/hoaxy/commands/init.py", line 138, in run File "/home/ubuntu/anaconda2/envs/hoaxy/lib/python3.7/site-packages/hoaxy-0.1.0-py3.7.egg/hoaxy/commands/init.py", line 115, in init File "/home/ubuntu/anaconda2/envs/hoaxy/lib/python3.7/site-packages/hoaxy-0.1.0-py3.7.egg/hoaxy/commands/site.py", line 368, in load_sites

TypeError: list indices must be integers or slices, not str

I dont really understand the error when I look at the sites.yaml file.

filmenczer commented 3 years ago

@chathuriw Can you please try with the same yaml file used by @awallemo ?

chathuriw commented 3 years ago

@awallemo, here is the correct yaml file. Your yaml file misses the 'sites' tag. (I had to rename it as txt to upload. Please remove it before you use it). Le t me know if you are still having issues.

sites.yaml.txt

awallemo commented 3 years ago

It worked now, thanks!