Closed glciampaglia closed 5 years ago
I have finished using 2to3 on every *.py file contained in hoaxy-backend in a local git repo on my computer. I went through the hoaxy-backend/requirements.txt file and checked each requirement online, and each of the requirements (with the occasional minimum version requirement) has Python 3 support. Should we require a newer version of scrapy than 1.3 (most recent version is 1.5.1)?
I spoke to @filmenczer, and we agreed that the changes I made require major testing before pushing to the official release repository. @shaochengcheng, @glciampaglia how would be the best way to test if the changes are okay?
Agree with Fil about testing. We need to make these changes in a separate branch, so that we can keep working on the Python2 code while we make all the necessary tests.
As for the tests, basically we will need to run a test instance of Hoaxy on a different machine. To make sure all parts work.
This explains branches: https://help.github.com/articles/about-branches/ https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
This is also useful information to keep in mind when porting Python 2 to 3: https://docs.python.org/3/howto/pyporting.html
On Thu, Sep 13, 2018 at 2:09 PM Zac Monroe notifications@github.com wrote:
I spoke to Fil, and we agreed that the changes I made require major testing before pushing to the official release repository. @shaochengcheng https://github.com/shaochengcheng , @glciampaglia https://github.com/glciampaglia how would be the best way to test if the changes are okay?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/IUNetSci/hoaxy-backend/issues/20#issuecomment-421100277, or mute the thread https://github.com/notifications/unsubscribe-auth/ABb-LCW1qflugvWqTZWJu9T_FmHWKrIbks5uap9YgaJpZM4WneSe .
-- Giovanni Luca Ciampaglia ∙ glciampaglia.com Assistant Professor Computer Science and Engineering https://www.usf.edu/engineering/cse/ ∙ University of South Florida https://www.usf.edu/News [image: 🕫]New email address: glc3@mail.usf.edu Hoaxy Botometer: Check out our new tool: https://hoaxy.iuni.iu.edu/
Oh I just saw you created a pull request --- that's exactly what we need (a pull request is basically a branch, but it can be reviewed from the web interface of Github).
G
On Thu, Sep 13, 2018 at 2:27 PM Giovanni Luca Ciampaglia < glciampagl@gmail.com> wrote:
Agree with Fil about testing. We need to make these changes in a separate branch, so that we can keep working on the Python2 code while we make all the necessary tests.
As for the tests, basically we will need to run a test instance of Hoaxy on a different machine. To make sure all parts work.
This explains branches: https://help.github.com/articles/about-branches/ https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
This is also useful information to keep in mind when porting Python 2 to 3: https://docs.python.org/3/howto/pyporting.html
On Thu, Sep 13, 2018 at 2:09 PM Zac Monroe notifications@github.com wrote:
I spoke to Fil, and we agreed that the changes I made require major testing before pushing to the official release repository. @shaochengcheng https://github.com/shaochengcheng , @glciampaglia https://github.com/glciampaglia how would be the best way to test if the changes are okay?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/IUNetSci/hoaxy-backend/issues/20#issuecomment-421100277, or mute the thread https://github.com/notifications/unsubscribe-auth/ABb-LCW1qflugvWqTZWJu9T_FmHWKrIbks5uap9YgaJpZM4WneSe .
-- Giovanni Luca Ciampaglia ∙ glciampaglia.com Assistant Professor Computer Science and Engineering https://www.usf.edu/engineering/cse/ ∙ University of South Florida https://www.usf.edu/News [image: 🕫]New email address : glc3@mail.usf.edu Hoaxy Botometer: Check out our new tool: https://hoaxy.iuni.iu.edu/
-- Giovanni Luca Ciampaglia ∙ glciampaglia.com Assistant Professor Computer Science and Engineering https://www.usf.edu/engineering/cse/ ∙ University of South Florida https://www.usf.edu/News [image: 🕫]New email address: glc3@mail.usf.edu Hoaxy Botometer: Check out our new tool: https://hoaxy.iuni.iu.edu/
Perfect. I made the pull request on branch origin/issue-new-parser hoaxy-python3.
Mmmh, it would be better to keep the development of the new extraction pipeline and the port to Python 3 distinct from each other.
If you haven't made any modification to the extraction code yet, would you please rename that branch something like "hoaxy-python3"?
If you already made some changes, would you please create a separate branch?
It is possible to write code that runs with both Python 3 and Python 2.7+ without much effort. That would be advisable. The HOWTO I linked in the message above explains how.
On Thu, Sep 13, 2018 at 2:29 PM Zac Monroe notifications@github.com wrote:
Perfect. I made the pull request on branch origin/issue-new-parser.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IUNetSci/hoaxy-backend/issues/20#issuecomment-421106769, or mute the thread https://github.com/notifications/unsubscribe-auth/ABb-LIM2cW02YbfFG8GqUJq6SIABPXDKks5uaqQigaJpZM4WneSe .
-- Giovanni Luca Ciampaglia ∙ glciampaglia.com Assistant Professor Computer Science and Engineering https://www.usf.edu/engineering/cse/ ∙ University of South Florida https://www.usf.edu/News [image: 🕫]New email address: glc3@mail.usf.edu Hoaxy Botometer: Check out our new tool: https://hoaxy.iuni.iu.edu/
Done. I had not worked directly on hoaxy's files when testing parsing methods, only other files.
Adding @chathuriw because she is also dealing with installation and Lucene issues.
Once Zac succeeds in upgrading to Python 3, he will document and share with @chathuriw. @clayadavis further suggests that we build a conda package for PyLucene (or even the entire Hoaxy?), in order to make it easier to replicate the build process on other machines in the future.
Developing a conda package or a even a docker container for the whole Hoaxy could be a good project for Val's team.
Success! Final steps:
@ZacMonroe I'm getting below error when running hoaxy init
(hoaxy-python3) chathuri@chathuri-Latitude-E7450:~/IUNI/hoaxy-backend$ hoaxy init
2019-01-30 14:27:15,401 - hoaxy(init) - INFO: Creating database tables:
2019-01-30 14:27:15,401 - hoaxy(init) - WARNING: Ignore existed tables
2019-01-30 14:27:15,424 - hoaxy(init) - INFO: Inserting platforms if not exist
2019-01-30 14:27:15,452 - hoaxy(init) - INFO: Trying to load site data:
2019-01-30 14:27:15,452 - hoaxy(init) - INFO: Claim domains /home/chathuri/.hoaxy/domains_claim.txt found
2019-01-30 14:27:15,453 - hoaxy(init) - INFO: Sending HTTP requests to infer base URLs ...
2019-01-30 14:27:15,455 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): infowars.com:80
2019-01-30 14:27:15,581 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://infowars.com:80 "HEAD / HTTP/1.1" 301 0
2019-01-30 14:27:15,583 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): infowars.com:443
2019-01-30 14:27:15,931 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://infowars.com:443 "HEAD / HTTP/1.1" 301 0
2019-01-30 14:27:15,936 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): www.infowars.com:443
2019-01-30 14:27:16,203 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://www.infowars.com:443 "HEAD / HTTP/1.1" 200 0
2019-01-30 14:27:16,207 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): empirenews.net:80
2019-01-30 14:27:16,545 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://empirenews.net:80 "HEAD / HTTP/1.1" 301 0
2019-01-30 14:27:16,549 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): empirenews.net:443
2019-01-30 14:27:16,916 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://empirenews.net:443 "HEAD / HTTP/1.1" 200 0
2019-01-30 14:27:16,925 - hoaxy(init) - DEBUG: Insert or update site infowars.com
2019-01-30 14:27:16,930 - hoaxy(init) - DEBUG: Insert or update site empirenews.net
2019-01-30 14:27:16,930 - hoaxy(init) - INFO: Fact checking domains /home/chathuri/.hoaxy/domains_factchecking.txt not found
2019-01-30 14:27:16,930 - hoaxy(init) - INFO: Site file /home/chathuri/.hoaxy/sites.yaml found
Traceback (most recent call last):
File "/home/chathuri/anaconda2/envs/hoaxy-python3/bin/hoaxy", line 11, in
Any idea why I'm getting that ?
Hi, this should be an issue of the string object behavior changes from python 3 over python 2. @ZacMonroe, we met this issue before and I believe it is not hard to correct.
Thanks Chengcheng
On Thu, Jan 31, 2019 at 3:29 AM Chathuri Peli Kankanamalage < notifications@github.com> wrote:
@ZacMonroe https://github.com/ZacMonroe I'm getting below error when running hoaxy init
(hoaxy-python3) chathuri@chathuri-Latitude-E7450:~/IUNI/hoaxy-backend$ hoaxy init 2019-01-30 14:27:15,401 - hoaxy(init) - INFO: Creating database tables: 2019-01-30 14:27:15,401 - hoaxy(init) - WARNING: Ignore existed tables 2019-01-30 14:27:15,424 - hoaxy(init) - INFO: Inserting platforms if not exist 2019-01-30 14:27:15,452 - hoaxy(init) - INFO: Trying to load site data: 2019-01-30 14:27:15,452 - hoaxy(init) - INFO: Claim domains /home/chathuri/.hoaxy/domains_claim.txt found 2019-01-30 14:27:15,453 - hoaxy(init) - INFO: Sending HTTP requests to infer base URLs ... 2019-01-30 14:27:15,455 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): infowars.com:80 2019-01-30 14:27:15,581 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://infowars.com:80 "HEAD / HTTP/1.1" 301 0 2019-01-30 14:27:15,583 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): infowars.com:443 2019-01-30 14:27:15,931 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://infowars.com:443 "HEAD / HTTP/1.1" 301 0 2019-01-30 14:27:15,936 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): www.infowars.com:443 2019-01-30 14:27:16,203 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://www.infowars.com:443 "HEAD / HTTP/1.1" 200 0 2019-01-30 14:27:16,207 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTP connection (1): empirenews.net:80 2019-01-30 14:27:16,545 - hoaxy(init)[urllib3.connectionpool] - DEBUG: http://empirenews.net:80 "HEAD / HTTP/1.1" 301 0 2019-01-30 14:27:16,549 - hoaxy(init)[urllib3.connectionpool] - DEBUG: Starting new HTTPS connection (1): empirenews.net:443 2019-01-30 14:27:16,916 - hoaxy(init)[urllib3.connectionpool] - DEBUG: https://empirenews.net:443 "HEAD / HTTP/1.1" 200 0 2019-01-30 14:27:16,925 - hoaxy(init) - DEBUG: Insert or update site infowars.com 2019-01-30 14:27:16,930 - hoaxy(init) - DEBUG: Insert or update site empirenews.net 2019-01-30 14:27:16,930 - hoaxy(init) - INFO: Fact checking domains /home/chathuri/.hoaxy/domains_factchecking.txt not found 2019-01-30 14:27:16,930 - hoaxy(init) - INFO: Site file /home/chathuri/.hoaxy/sites.yaml found Traceback (most recent call last): File "/home/chathuri/anaconda2/envs/hoaxy-python3/bin/hoaxy", line 11, in load_entry_point('hoaxy==0.1.0', 'console_scripts', 'hoaxy')() File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/cmdline.py", line 122, in main File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/init.py", line 138, in run File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/init.py", line 115, in init File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/site.py", line 371, in load_sites If the readline module can be imported, the hook will set the Tab key File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/site.py", line 371, in If the readline module can be imported, the hook will set the Tab key File "/home/chathuri/anaconda2/envs/hoaxy-python3/lib/python3.6/site-packages/hoaxy-0.1.0-py3.6.egg/hoaxy/commands/site.py", line 143, in parse_site continue AttributeError: 'str' object has no attribute 'get'
Any idea why I'm getting that ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/IUNetSci/hoaxy-backend/issues/20#issuecomment-459077084, or mute the thread https://github.com/notifications/unsubscribe-auth/AK-Im6b8WbwX3tyJwd0J-0xCfX0f6mJQks5vIfKogaJpZM4WneSe .
@chathuriw It looks like a string is being passed into parse_site()
(instead of a dict) in hoaxy/commands/site.py
. Tracing it back to when parse_site()
is called in that file (line 371), load_sites()
is trying to parse each site in sites
, which is a loaded yaml file. Perhaps the yaml file that you're using has been formatted incorrectly according to the methods' requirements? I personally used domains_factchecking.txt
and domains_claim.txt
instead of a sites.yaml
so I'm not sure about that.
@shaochengcheng I think this is a different issue; what we had before was when we were attempting to decode
a string.
@ZacMonroe and @shaochengcheng thank you for the suggestions. I removed site.yaml from .hoaxy folder and now I'm able to run 'hoaxy init' without errors. I started twitter stream 'hoaxy sns --twitter-streaming' and it is running without any issue. Then I ran 'hoaxy lucene --index' which gives me this.
(hoaxy-python3) chathuri@chathuri-Latitude-E7450:~/IUNI/hoaxy-backend$ hoaxy lucene --index Traceback (most recent call last):
File "/home/chathuri/anaconda2/envs/hoaxy-python3/bin/hoaxy", line 11, in
I'm only running twitter stream. I think I can run the build index while twitter stream is running right ?
We discussed on 6 March 2019 and decided in favor of docker container over conda package solution. Moved to-do list to top of thread.
To do:
As part of the new extraction pipeline (#6) we want to run goose3, which is written in Python3. We discussed the issue and it looks like there are no external dependencies that cannot be move to Python3. So the task is to convert all source code to Python3 using the
2to3
utility. As part of it, we also need to update all external packages to versions that support Python3. In particular, we want to make sure that scrapy is updated to a recent version, so thatCertificateError
exceptions are caught (as in the error below):