osome-iu / hoaxy-backend

Backend component for Hoaxy, a tool to visualize the spread of claims and fact checking
http://hoaxy.iuni.iu.edu/
GNU General Public License v3.0
139 stars 44 forks source link

Convert to Hoaxy to Python 3 #20

Closed glciampaglia closed 5 years ago

glciampaglia commented 5 years ago

To do:

As part of the new extraction pipeline (#6) we want to run goose3, which is written in Python3. We discussed the issue and it looks like there are no external dependencies that cannot be move to Python3. So the task is to convert all source code to Python3 using the 2to3 utility. As part of it, we also need to update all external packages to versions that support Python3. In particular, we want to make sure that scrapy is updated to a recent version, so that CertificateError exceptions are caught (as in the error below):

Error during info_callback
Traceback (most recent call last):
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 415, in dataReceived
    self._checkHandshakeStatus()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 335, in _checkHandshakeStatus
    self._tlsConnection.do_handshake()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1425, in do_handshake
    result = _lib.SSL_do_handshake(self._ssl)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 917, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1151, in infoCallback
    return wrapped(connection, where, ret)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/scrapy/core/downloader/tls.py", line 52, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 44, in verify_hostname
    cert_patterns=extract_ids(connection.get_peer_certificate()),
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 102, in extract_ids
    if c[0] == b"CN"]
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 161, in __init__
    _validate_pattern(self.pattern)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 406, in _validate_pattern
    .format(cert_pattern)
service_identity.exceptions.CertificateError: Certificate's DNS-ID '*' hast too few host components for wildcard usage.

2018-09-08 02:10:24,660 - hoaxy(crawl.fetch-html)[twisted] - CRITICAL: Error during info_callback
Traceback (most recent call last):
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 415, in dataReceived
    self._checkHandshakeStatus()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 335, in _checkHandshakeStatus
    self._tlsConnection.do_handshake()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1425, in do_handshake
    result = _lib.SSL_do_handshake(self._ssl)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 917, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1151, in infoCallback
    return wrapped(connection, where, ret)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/scrapy/core/downloader/tls.py", line 52, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 44, in verify_hostname
    cert_patterns=extract_ids(connection.get_peer_certificate()),
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 102, in extract_ids
    if c[0] == b"CN"]
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 161, in __init__
    _validate_pattern(self.pattern)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 406, in _validate_pattern
    .format(cert_pattern)
service_identity.exceptions.CertificateError: Certificate's DNS-ID '*' hast too few host components for wildcard usage.

Error during info_callback
Traceback (most recent call last):
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 415, in dataReceived
    self._checkHandshakeStatus()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 335, in _checkHandshakeStatus
    self._tlsConnection.do_handshake()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1425, in do_handshake
    result = _lib.SSL_do_handshake(self._ssl)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 917, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1151, in infoCallback
    return wrapped(connection, where, ret)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/scrapy/core/downloader/tls.py", line 52, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 44, in verify_hostname
    cert_patterns=extract_ids(connection.get_peer_certificate()),
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 102, in extract_ids
    if c[0] == b"CN"]
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 161, in __init__
    _validate_pattern(self.pattern)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 406, in _validate_pattern
    .format(cert_pattern)
service_identity.exceptions.CertificateError: Certificate's DNS-ID '*' hast too few host components for wildcard usage.

2018-09-08 02:10:25,636 - hoaxy(crawl.fetch-html)[twisted] - CRITICAL: Error during info_callback
Traceback (most recent call last):
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 415, in dataReceived
    self._checkHandshakeStatus()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 335, in _checkHandshakeStatus
    self._tlsConnection.do_handshake()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1425, in do_handshake
    result = _lib.SSL_do_handshake(self._ssl)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 917, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1151, in infoCallback
    return wrapped(connection, where, ret)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/scrapy/core/downloader/tls.py", line 52, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 44, in verify_hostname
    cert_patterns=extract_ids(connection.get_peer_certificate()),
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 102, in extract_ids
    if c[0] == b"CN"]
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 161, in __init__
    _validate_pattern(self.pattern)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 406, in _validate_pattern
    .format(cert_pattern)
service_identity.exceptions.CertificateError: Certificate's DNS-ID '*' hast too few host components for wildcard usage.

Error during info_callback
Traceback (most recent call last):
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 415, in dataReceived
    self._checkHandshakeStatus()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 335, in _checkHandshakeStatus
    self._tlsConnection.do_handshake()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1425, in do_handshake
    result = _lib.SSL_do_handshake(self._ssl)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 917, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1151, in infoCallback
    return wrapped(connection, where, ret)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/scrapy/core/downloader/tls.py", line 52, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 44, in verify_hostname
    cert_patterns=extract_ids(connection.get_peer_certificate()),
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 102, in extract_ids
    if c[0] == b"CN"]
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 161, in __init__
    _validate_pattern(self.pattern)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 406, in _validate_pattern
    .format(cert_pattern)
service_identity.exceptions.CertificateError: Certificate's DNS-ID '*' hast too few host components for wildcard usage.

2018-09-08 02:10:26,741 - hoaxy(crawl.fetch-html)[twisted] - CRITICAL: Error during info_callback
Traceback (most recent call last):
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 415, in dataReceived
    self._checkHandshakeStatus()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/protocols/tls.py", line 335, in _checkHandshakeStatus
    self._tlsConnection.do_handshake()
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 1425, in do_handshake
    result = _lib.SSL_do_handshake(self._ssl)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/OpenSSL/SSL.py", line 917, in wrapper
    callback(Connection._reverse_mapping[ssl], where, return_code)
--- <exception caught here> ---
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/twisted/internet/_sslverify.py", line 1151, in infoCallback
    return wrapped(connection, where, ret)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/scrapy/core/downloader/tls.py", line 52, in _identityVerifyingInfoCallback
    verifyHostname(connection, self._hostnameASCII)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 44, in verify_hostname
    cert_patterns=extract_ids(connection.get_peer_certificate()),
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/pyopenssl.py", line 102, in extract_ids
    if c[0] == b"CN"]
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 161, in __init__
    _validate_pattern(self.pattern)
  File "/u/truthy/miniconda3/envs/hoaxy-backend/lib/python2.7/site-packages/service_identity/_common.py", line 406, in _validate_pattern
    .format(cert_pattern)
service_identity.exceptions.CertificateError: Certificate's DNS-ID '*' hast too few host components for wildcard usage.
ZacMilano commented 5 years ago

I have finished using 2to3 on every *.py file contained in hoaxy-backend in a local git repo on my computer. I went through the hoaxy-backend/requirements.txt file and checked each requirement online, and each of the requirements (with the occasional minimum version requirement) has Python 3 support. Should we require a newer version of scrapy than 1.3 (most recent version is 1.5.1)?

ZacMilano commented 5 years ago

I spoke to @filmenczer, and we agreed that the changes I made require major testing before pushing to the official release repository. @shaochengcheng, @glciampaglia how would be the best way to test if the changes are okay?

glciampaglia commented 5 years ago

Agree with Fil about testing. We need to make these changes in a separate branch, so that we can keep working on the Python2 code while we make all the necessary tests.

As for the tests, basically we will need to run a test instance of Hoaxy on a different machine. To make sure all parts work.

This explains branches: https://help.github.com/articles/about-branches/ https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell

This is also useful information to keep in mind when porting Python 2 to 3: https://docs.python.org/3/howto/pyporting.html

On Thu, Sep 13, 2018 at 2:09 PM Zac Monroe notifications@github.com wrote:

I spoke to Fil, and we agreed that the changes I made require major testing before pushing to the official release repository. @shaochengcheng https://github.com/shaochengcheng , @glciampaglia https://github.com/glciampaglia how would be the best way to test if the changes are okay?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/IUNetSci/hoaxy-backend/issues/20#issuecomment-421100277, or mute the thread https://github.com/notifications/unsubscribe-auth/ABb-LCW1qflugvWqTZWJu9T_FmHWKrIbks5uap9YgaJpZM4WneSe .

-- Giovanni Luca Ciampaglia ∙ glciampaglia.com Assistant Professor Computer Science and Engineering https://www.usf.edu/engineering/cse/ ∙ University of South Florida https://www.usf.edu/News [image: 🕫]New email address: glc3@mail.usf.edu Hoaxy Botometer: Check out our new tool: https://hoaxy.iuni.iu.edu/

glciampaglia commented 5 years ago

Oh I just saw you created a pull request --- that's exactly what we need (a pull request is basically a branch, but it can be reviewed from the web interface of Github).

G

On Thu, Sep 13, 2018 at 2:27 PM Giovanni Luca Ciampaglia < glciampagl@gmail.com> wrote:

Agree with Fil about testing. We need to make these changes in a separate branch, so that we can keep working on the Python2 code while we make all the necessary tests.

As for the tests, basically we will need to run a test instance of Hoaxy on a different machine. To make sure all parts work.

This explains branches: https://help.github.com/articles/about-branches/ https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell

This is also useful information to keep in mind when porting Python 2 to 3: https://docs.python.org/3/howto/pyporting.html

On Thu, Sep 13, 2018 at 2:09 PM Zac Monroe notifications@github.com wrote:

I spoke to Fil, and we agreed that the changes I made require major testing before pushing to the official release repository. @shaochengcheng https://github.com/shaochengcheng , @glciampaglia https://github.com/glciampaglia how would be the best way to test if the changes are okay?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/IUNetSci/hoaxy-backend/issues/20#issuecomment-421100277, or mute the thread https://github.com/notifications/unsubscribe-auth/ABb-LCW1qflugvWqTZWJu9T_FmHWKrIbks5uap9YgaJpZM4WneSe .

-- Giovanni Luca Ciampaglia ∙ glciampaglia.com Assistant Professor Computer Science and Engineering https://www.usf.edu/engineering/cse/ ∙ University of South Florida https://www.usf.edu/News [image: 🕫]New email address : glc3@mail.usf.edu Hoaxy Botometer: Check out our new tool: https://hoaxy.iuni.iu.edu/

-- Giovanni Luca Ciampaglia ∙ glciampaglia.com Assistant Professor Computer Science and Engineering https://www.usf.edu/engineering/cse/ ∙ University of South Florida https://www.usf.edu/News [image: 🕫]New email address: glc3@mail.usf.edu Hoaxy Botometer: Check out our new tool: https://hoaxy.iuni.iu.edu/

ZacMilano commented 5 years ago

Perfect. I made the pull request on branch origin/issue-new-parser hoaxy-python3.

glciampaglia commented 5 years ago

Mmmh, it would be better to keep the development of the new extraction pipeline and the port to Python 3 distinct from each other.

If you haven't made any modification to the extraction code yet, would you please rename that branch something like "hoaxy-python3"?

If you already made some changes, would you please create a separate branch?

It is possible to write code that runs with both Python 3 and Python 2.7+ without much effort. That would be advisable. The HOWTO I linked in the message above explains how.

On Thu, Sep 13, 2018 at 2:29 PM Zac Monroe notifications@github.com wrote:

Perfect. I made the pull request on branch origin/issue-new-parser.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IUNetSci/hoaxy-backend/issues/20#issuecomment-421106769, or mute the thread https://github.com/notifications/unsubscribe-auth/ABb-LIM2cW02YbfFG8GqUJq6SIABPXDKks5uaqQigaJpZM4WneSe .

-- Giovanni Luca Ciampaglia ∙ glciampaglia.com Assistant Professor Computer Science and Engineering https://www.usf.edu/engineering/cse/ ∙ University of South Florida https://www.usf.edu/News [image: 🕫]New email address: glc3@mail.usf.edu Hoaxy Botometer: Check out our new tool: https://hoaxy.iuni.iu.edu/

ZacMilano commented 5 years ago

Done. I had not worked directly on hoaxy's files when testing parsing methods, only other files.

filmenczer commented 5 years ago

Adding @chathuriw because she is also dealing with installation and Lucene issues.

filmenczer commented 5 years ago

Once Zac succeeds in upgrading to Python 3, he will document and share with @chathuriw. @clayadavis further suggests that we build a conda package for PyLucene (or even the entire Hoaxy?), in order to make it easier to replicate the build process on other machines in the future.

glciampaglia commented 5 years ago

Developing a conda package or a even a docker container for the whole Hoaxy could be a good project for Val's team.

filmenczer commented 5 years ago

Success! Final steps:

chathuriw commented 5 years ago

@ZacMonroe I'm getting below error when running hoaxy init

(hoaxy-python3) chathuri@chathuri-Latitude-E7450:~/IUNI/hoaxy-backend$ hoaxy init 2019-01-30 14:27:15,401 - hoaxy(init) - INFO: Creating database tables: 2019-01-30 14:27:15,401 - hoaxy(init) - WARNING: Ignore existed tables 2019-01-30 14:27:15,424 - hoaxy(init) - INFO: Inserting platforms if not exist 2019-01-30 14:27:15,452 - hoaxy(init) - INFO: Trying to load site data: 2019-01-30 14:27:15,452 - hoaxy(init) - INFO: Claim domains /home/chathuri/.hoaxy/domains_claim.txt found 2019-01-30 14:27:15,453 - hoaxy(init) - INFO: Sending HTTP requests to infer base URLs ... 2019-01-30 14:27:15,455 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): infowars.com:80 2019-01-30 14:27:15,581 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://infowars.com:80 "HEAD / HTTP/1.1" 301 0 2019-01-30 14:27:15,583 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): infowars.com:443 2019-01-30 14:27:15,931 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://infowars.com:443 "HEAD / HTTP/1.1" 301 0 2019-01-30 14:27:15,936 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): www.infowars.com:443 2019-01-30 14:27:16,203 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://www.infowars.com:443 "HEAD / HTTP/1.1" 200 0 2019-01-30 14:27:16,207 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): empirenews.net:80 2019-01-30 14:27:16,545 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://empirenews.net:80 "HEAD / HTTP/1.1" 301 0 2019-01-30 14:27:16,549 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): empirenews.net:443 2019-01-30 14:27:16,916 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://empirenews.net:443 "HEAD / HTTP/1.1" 200 0 2019-01-30 14:27:16,925 - hoaxy(init) - DEBUG: Insert or update site infowars.com 2019-01-30 14:27:16,930 - hoaxy(init) - DEBUG: Insert or update site empirenews.net 2019-01-30 14:27:16,930 - hoaxy(init) - INFO: Fact checking domains /home/chathuri/.hoaxy/domains_factchecking.txt not found 2019-01-30 14:27:16,930 - hoaxy(init) - INFO: Site file /home/chathuri/.hoaxy/sites.yaml found Traceback (most recent call last): File "/home/chathuri/anaconda2/envs/hoaxy-python3/bin/hoaxy", line 11, in load_entry_point('hoaxy==0.1.0', 'console_scripts', 'hoaxy')() File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/cmdline.py", line 122, in main File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/init.py", line 138, in run File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/init.py", line 115, in init File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/site.py", line 371, in load_sites If the readline module can be imported, the hook will set the Tab key File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/site.py", line 371, in If the readline module can be imported, the hook will set the Tab key File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/site.py", line 143, in parse_site continue AttributeError: 'str' object has no attribute 'get'

Any idea why I'm getting that ?

shaochengcheng commented 5 years ago

Hi, this should be an issue of the string object behavior changes from python 3 over python 2. @ZacMonroe, we met this issue before and I believe it is not hard to correct.

Thanks Chengcheng

On Thu, Jan 31, 2019 at 3:29 AM Chathuri Peli Kankanamalage < notifications@github.com> wrote:

@ZacMonroe https://github.com/ZacMonroe I'm getting below error when running hoaxy init

(hoaxy-python3) chathuri@chathuri-Latitude-E7450:~/IUNI/hoaxy-backend$ hoaxy init 2019-01-30 14:27:15,401 - hoaxy(init) - INFO: Creating database tables: 2019-01-30 14:27:15,401 - hoaxy(init) - WARNING: Ignore existed tables 2019-01-30 14:27:15,424 - hoaxy(init) - INFO: Inserting platforms if not exist 2019-01-30 14:27:15,452 - hoaxy(init) - INFO: Trying to load site data: 2019-01-30 14:27:15,452 - hoaxy(init) - INFO: Claim domains /home/chathuri/.hoaxy/domains_claim.txt found 2019-01-30 14:27:15,453 - hoaxy(init) - INFO: Sending HTTP requests to infer base URLs ... 2019-01-30 14:27:15,455 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): infowars.com:80 2019-01-30 14:27:15,581 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://infowars.com:80 "HEAD / HTTP/1.1" 301 0 2019-01-30 14:27:15,583 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): infowars.com:443 2019-01-30 14:27:15,931 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://infowars.com:443 "HEAD / HTTP/1.1" 301 0 2019-01-30 14:27:15,936 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): www.infowars.com:443 2019-01-30 14:27:16,203 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://www.infowars.com:443 "HEAD / HTTP/1.1" 200 0 2019-01-30 14:27:16,207 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): empirenews.net:80 2019-01-30 14:27:16,545 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://empirenews.net:80 "HEAD / HTTP/1.1" 301 0 2019-01-30 14:27:16,549 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): empirenews.net:443 2019-01-30 14:27:16,916 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://empirenews.net:443 "HEAD / HTTP/1.1" 200 0 2019-01-30 14:27:16,925 - hoaxy(init) - DEBUG: Insert or update site infowars.com 2019-01-30 14:27:16,930 - hoaxy(init) - DEBUG: Insert or update site empirenews.net 2019-01-30 14:27:16,930 - hoaxy(init) - INFO: Fact checking domains /home/chathuri/.hoaxy/domains_factchecking.txt not found 2019-01-30 14:27:16,930 - hoaxy(init) - INFO: Site file /home/chathuri/.hoaxy/sites.yaml found Traceback (most recent call last): File "/home/chathuri/anaconda2/envs/hoaxy-python3/bin/hoaxy", line 11, in load_entry_point('hoaxy==0.1.0', 'console_scripts', 'hoaxy')() File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/cmdline.py", line 122, in main File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/init.py", line 138, in run File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/init.py", line 115, in init File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/site.py", line 371, in load_sites If the readline module can be imported, the hook will set the Tab key File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/site.py", line 371, in If the readline module can be imported, the hook will set the Tab key File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/site.py", line 143, in parse_site continue AttributeError: 'str' object has no attribute 'get'

Any idea why I'm getting that ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IUNetSci/hoaxy-backend/issues/20#issuecomment-459077084, or mute the thread https://github.com/notifications/unsubscribe-auth/AK-Im6b8WbwX3tyJwd0J-0xCfX0f6mJQks5vIfKogaJpZM4WneSe .

ZacMilano commented 5 years ago

@chathuriw It looks like a string is being passed into parse_site() (instead of a dict) in hoaxy/commands/site.py. Tracing it back to when parse_site() is called in that file (line 371), load_sites() is trying to parse each site in sites, which is a loaded yaml file. Perhaps the yaml file that you're using has been formatted incorrectly according to the methods' requirements? I personally used domains_factchecking.txt and domains_claim.txt instead of a sites.yaml so I'm not sure about that.

@shaochengcheng I think this is a different issue; what we had before was when we were attempting to decode a string.

chathuriw commented 5 years ago

@ZacMonroe and @shaochengcheng thank you for the suggestions. I removed site.yaml from .hoaxy folder and now I'm able to run 'hoaxy init' without errors. I started twitter stream 'hoaxy sns --twitter-streaming' and it is running without any issue. Then I ran 'hoaxy lucene --index' which gives me this.

(hoaxy-python3) chathuri@chathuri-Latitude-E7450:~/IUNI/hoaxy-backend$ hoaxy lucene --index Traceback (most recent call last): File "/home/chathuri/anaconda2/envs/hoaxy-python3/bin/hoaxy", line 11, in load_entry_point('hoaxy==0.1.0', 'console_scripts', 'hoaxy')() File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/cmdline.py", line 122, in main File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/lucene_cmd.py", line 156, in run File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/lucene_cmd.py", line 92, in index File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/ir/index.py", line 40, in init lucene.InvalidArgsError: (<class 'FSDirectory'>, 'open', (<File: /home/chathuri/IUNI/hoaxy-backend/apps/lucene-index>,))

I'm only running twitter stream. I think I can run the build index while twitter stream is running right ?

filmenczer commented 5 years ago

We discussed on 6 March 2019 and decided in favor of docker container over conda package solution. Moved to-do list to top of thread.