Feature importance - Githubissues

sophieball / toxicity-detector

MIT License

0 stars 0 forks source link

Feature importance #101

Closed sophieball closed 3 years ago

sophieball commented 3 years ago

@CaptainEmerson Can you run this version again? I wanted to see the feature importance so we could discuss which features are more important for which task...

Two function calls: train_classifier_g and convo_word_freq_diff

(somehow my new commits counted towards Feb 4's....)

CaptainEmerson commented 3 years ago

Consistent with the prior CL, I'm still getting:

> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 8, in <module>
    import download_data
  File "/usr/local/google/home/emersonm/toxicity-detector/src/download_data.py", line 5, in <module>
    import nltk
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__nltk/nltk/__init__.py", line 133, in <module>
    from nltk.text import *
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__nltk/nltk/text.py", line 30, in <module>
    from nltk.tokenize import sent_tokenize
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__nltk/nltk/tokenize/__init__.py", line 66, in <module>
    from nltk.tokenize.casual import TweetTokenizer, casual_tokenize
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__nltk/nltk/tokenize/casual.py", line 38, in <module>
    import regex  # https://github.com/nltk/nltk/issues/2409
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__regex/regex/__init__.py", line 1, in <module>
    from .regex import *
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__regex/regex/regex.py", line 419, in <module>
    import regex._regex_core as _regex_core
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__regex/regex/_regex_core.py", line 21, in <module>
    import regex._regex as _regex
ModuleNotFoundError: No module named 'regex._regex'

Ideas for debugging?

sophieball commented 3 years ago

Did you try add regex to requirement.txt?

CaptainEmerson commented 3 years ago

regex is already in requirements.txt. I tried replacing it with regex==2019.11.1, as suggested in https://github.com/psf/black/issues/1207, but I keep getting a timeout during bazel build.

sophieball commented 3 years ago

probably try some other newer version? some version in 2021?

sophieball commented 3 years ago

any luck with convo_word_freq_diff?

CaptainEmerson commented 3 years ago

It appears that I've fixed the regex problem, as you suggested, and attempted to fix a subsequent pytest dependency:

diff --git a/requirements.txt b/requirements.txt
index 60d565a..e5f7086 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -23,9 +23,10 @@ pandas
 pip
 plac
 preshed
+pytest
 python-dateutil
 pytz
-regex
+regex==2021.3.17
 requests
 scipy
 setuptools

But, even if I pin pytest to a specific version, I still get this error:

> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 8, in <module>
    import download_data
  File "/usr/local/google/home/emersonm/toxicity-detector/src/download_data.py", line 5, in <module>
    import nltk
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__nltk/nltk/__init__.py", line 142, in <module>
    from nltk.chunk import *
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__nltk/nltk/chunk/__init__.py", line 157, in <module>
    from nltk.chunk.api import ChunkParserI
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__nltk/nltk/chunk/api.py", line 13, in <module>
    from nltk.parse import ParserI
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__nltk/nltk/parse/__init__.py", line 102, in <module>
    from nltk.parse.corenlp import CoreNLPParser, CoreNLPDependencyParser
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__nltk/nltk/parse/corenlp.py", line 23, in <module>
    import pytest
ModuleNotFoundError: No module named 'pytest'

> system2("bazel-bin/main/train_classifier_g",
+         input = format_csv(df[is.finite(review_time) &
+                               is.finite(shepherd_time) &
+                               is.finite(rounds)]))
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/main/train_classifier_g.py", line 2, in <module>
    from src import download_data
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/src/download_data.py", line 5, in <module>
    import nltk
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__nltk/nltk/__init__.py", line 142, in <module>
    from nltk.chunk import *
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__nltk/nltk/chunk/__init__.py", line 157, in <module>
    from nltk.chunk.api import ChunkParserI
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__nltk/nltk/chunk/api.py", line 13, in <module>
    from nltk.parse import ParserI
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__nltk/nltk/parse/__init__.py", line 102, in <module>
    from nltk.parse.corenlp import CoreNLPParser, CoreNLPDependencyParser
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__nltk/nltk/parse/corenlp.py", line 23, in <module>
    import pytest
ModuleNotFoundError: No module named 'pytest'
>

sophieball commented 3 years ago

I found this: https://medium.com/@dirk.avery/pytest-modulenotfounderror-no-module-named-requests-a770e6926ac5 But I don't know how much it's applicable to bazel

CaptainEmerson commented 3 years ago

I don't know how relevant that link is, given we're using bazel and not a virtual environment. I'll try a few versions in the comments that follow.

CaptainEmerson commented 3 years ago

Changing pytest version back to 6.2.1


> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
⚠ Skipping model package dependencies and setting `--no-deps`. You
don't seem to have the spaCy package itself installed (maybe because you've
built from source?), so installing the model dependencies would cause spaCy to
be downloaded, which probably isn't what you want. If the model package has
other dependencies, you'll have to install them manually.
Collecting en_core_web_sm==2.3.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.0/en_core_web_sm-2.3.0.tar.gz (12.0 MB)
     |████████████████████████████████| 12.0 MB 12.6 MB/s 
Building wheels for collected packages: en-core-web-sm
  Building wheel for en-core-web-sm (setup.py) ... done
  Created wheel for en-core-web-sm: filename=en_core_web_sm-2.3.0-py3-none-any.whl size=12048606 sha256=26e629bc87cff745408eea756aaa27a9e79f03547947676f9289594f2054d179
  Stored in directory: /tmp/pip-ephem-wheel-cache-o97l_8_k/wheels/7a/ae/d9/ce6b7070ac24baeea5e6e8c72ed77833bf5d6591b76130a92d
Successfully built en-core-web-sm
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-2.3.0
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
⚠ Download successful but linking failed
Creating a shortcut link for 'en' didn't work (maybe you don't have admin
permissions?), but you can still load the model via its full package name: nlp =
spacy.load('en_core_web_sm')
⚠ Skipping model package dependencies and setting `--no-deps`. You
don't seem to have the spaCy package itself installed (maybe because you've
built from source?), so installing the model dependencies would cause spaCy to
be downloaded, which probably isn't what you want. If the model package has
other dependencies, you'll have to install them manually.
Requirement already satisfied: en_core_web_sm==2.3.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.0/en_core_web_sm-2.3.0.tar.gz#egg=en_core_web_sm==2.3.0 in /usr/local/google/home/emersonm/.local/lib/python3.9/site-packages (2.3.0)
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
⚠ Download successful but linking failed
Creating a shortcut link for 'en' didn't work (maybe you don't have admin
permissions?), but you can still load the model via its full package name: nlp =
spacy.load('en_core_web_sm')
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 37, in <module>
    nlp = spacy.load("en_core_web_sm", disable=["parser", "ner"])
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/__init__.py", line 30, in load
    return util.load_model(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 175, in load_model
    raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
⚠ Skipping model package dependencies and setting `--no-deps`. You
don't seem to have the spaCy package itself installed (maybe because you've
built from source?), so installing the model dependencies would cause spaCy to
be downloaded, which probably isn't what you want. If the model package has
other dependencies, you'll have to install them manually.
Requirement already satisfied: en_core_web_sm==2.3.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.0/en_core_web_sm-2.3.0.tar.gz#egg=en_core_web_sm==2.3.0 in /usr/local/google/home/emersonm/.local/lib/python3.9/site-packages (2.3.0)
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
✔ Linking successful
/usr/local/google/home/emersonm/.local/lib/python3.9/site-packages/en_core_web_sm
-->
/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/data/en
You can now load the model via spacy.load('en')
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 22, in <module>
    import convo_politeness
  File "/usr/local/google/home/emersonm/toxicity-detector/src/convo_politeness.py", line 5, in <module>
    download_data.download_data()
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/download_data.py", line 31, in download_data
    spacy.load("en")
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/__init__.py", line 30, in load
    return util.load_model(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 168, in load_model
    return load_model_from_link(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 185, in load_model_from_link
    return cls.load(**overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/data/en/__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 235, in load_model_from_init_py
    return load_model_from_path(data_path, meta, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 218, in load_model_from_path
    return nlp.from_disk(model_path, exclude=disable)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/language.py", line 971, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 686, in from_disk
    reader(path / key)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/language.py", line 948, in deserialize_vocab
    _fix_pretrained_vectors_name(self)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/language.py", line 1106, in _fix_pretrained_vectors_name
    proc.cfg.setdefault("deprecation_fixes", {})
AttributeError: 'getset_descriptor' object has no attribute 'setdefault'
>

CaptainEmerson commented 3 years ago

Changing pytest version back to 6.2.2

Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 9, in <module>
    download_data.download_data()
  File "/usr/local/google/home/emersonm/toxicity-detector/src/download_data.py", line 31, in download_data
    spacy.load("en")
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/__init__.py", line 30, in load
    return util.load_model(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 168, in load_model
    return load_model_from_link(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 185, in load_model_from_link
    return cls.load(**overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/data/en/__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 235, in load_model_from_init_py
    return load_model_from_path(data_path, meta, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 218, in load_model_from_path
    return nlp.from_disk(model_path, exclude=disable)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/language.py", line 971, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 686, in from_disk
    reader(path / key)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/language.py", line 948, in deserialize_vocab
    _fix_pretrained_vectors_name(self)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/language.py", line 1106, in _fix_pretrained_vectors_name
    proc.cfg.setdefault("deprecation_fixes", {})
AttributeError: 'getset_descriptor' object has no attribute 'setdefault'

CaptainEmerson commented 3 years ago

Changing pytest version back to 6.2.3

(same error)

Seems like this is a new error.

Remove version number from pytest

(same error)

Remove pytest from requirements.txt

(same error)

regex==2021.3.17 --> regex (have now fully reverted requirements.txt)

(same error)

spacy==2.3.0 --> spacy (remove version number)

> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 8, in <module>
    import download_data
  File "/usr/local/google/home/emersonm/toxicity-detector/src/download_data.py", line 6, in <module>
    import spacy
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/__init__.py", line 10, in <module>
    from thinc.api import prefer_gpu, require_gpu, require_cpu  # noqa: F401
ImportError: cannot import name 'prefer_gpu' from 'thinc.api' (/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__thinc/thinc/api.py)

IIRC, the above error is why we switched to spacy 2.3.

Sophie, other ideas on how to fix the AttributeError?

sophieball commented 3 years ago

I removed the IOError constraint in download_data.py to force it to download the package anyway.. not sure if it would resolve the problem..

CaptainEmerson commented 3 years ago

Hmm:

> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
⚠ Skipping model package dependencies and setting `--no-deps`. You
don't seem to have the spaCy package itself installed (maybe because you've
built from source?), so installing the model dependencies would cause spaCy to
be downloaded, which probably isn't what you want. If the model package has
other dependencies, you'll have to install them manually.
Requirement already satisfied: en_core_web_sm==2.3.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.0/en_core_web_sm-2.3.0.tar.gz#egg=en_core_web_sm==2.3.0 in /usr/local/google/home/emersonm/.local/lib/python3.9/site-packages (2.3.0)
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
✔ Linking successful
/usr/local/google/home/emersonm/.local/lib/python3.9/site-packages/en_core_web_sm
-->
/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/data/en
You can now load the model via spacy.load('en')
⚠ Skipping model package dependencies and setting `--no-deps`. You
don't seem to have the spaCy package itself installed (maybe because you've
built from source?), so installing the model dependencies would cause spaCy to
be downloaded, which probably isn't what you want. If the model package has
other dependencies, you'll have to install them manually.
Requirement already satisfied: en_core_web_sm==2.3.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.0/en_core_web_sm-2.3.0.tar.gz#egg=en_core_web_sm==2.3.0 in /usr/local/google/home/emersonm/.local/lib/python3.9/site-packages (2.3.0)
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
✔ Linking successful
/usr/local/google/home/emersonm/.local/lib/python3.9/site-packages/en_core_web_sm
-->
/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/data/en
You can now load the model via spacy.load('en')
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 37, in <module>
    nlp = spacy.load("en_core_web_sm", disable=["parser", "ner"])
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/__init__.py", line 30, in load
    return util.load_model(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 170, in load_model
    return load_model_from_package(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 191, in load_model_from_package
    return cls.load(**overrides)
  File "/usr/local/google/home/emersonm/.local/lib/python3.9/site-packages/en_core_web_sm/__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 235, in load_model_from_init_py
    return load_model_from_path(data_path, meta, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 216, in load_model_from_path
    component = nlp.create_pipe(factory, config=config)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/language.py", line 309, in create_pipe
    return factory(self, **config)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/language.py", line 1080, in factory
    return obj.from_nlp(nlp, **cfg)
  File "pipes.pyx", line 62, in spacy.pipeline.pipes.Pipe.from_nlp
TypeError: type() takes 1 or 3 arguments

CaptainEmerson commented 3 years ago

With bazel-bin/main/train_classifier_g, I still get TypeError: type() takes 1 or 3 arguments

With the other one:

> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
⚠ Skipping model package dependencies and setting `--no-deps`. You
don't seem to have the spaCy package itself installed (maybe because you've
built from source?), so installing the model dependencies would cause spaCy to
be downloaded, which probably isn't what you want. If the model package has
other dependencies, you'll have to install them manually.
Requirement already satisfied: en_core_web_sm==2.3.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.0/en_core_web_sm-2.3.0.tar.gz#egg=en_core_web_sm==2.3.0 in /usr/local/google/home/emersonm/.local/lib/python3.9/site-packages (2.3.0)
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
✔ Linking successful
/usr/local/google/home/emersonm/.local/lib/python3.9/site-packages/en_core_web_sm
-->
/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/data/en
You can now load the model via spacy.load('en')
⚠ Skipping model package dependencies and setting `--no-deps`. You
don't seem to have the spaCy package itself installed (maybe because you've
built from source?), so installing the model dependencies would cause spaCy to
be downloaded, which probably isn't what you want. If the model package has
other dependencies, you'll have to install them manually.
Requirement already satisfied: en_core_web_sm==2.3.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.0/en_core_web_sm-2.3.0.tar.gz#egg=en_core_web_sm==2.3.0 in /usr/local/google/home/emersonm/.local/lib/python3.9/site-packages (2.3.0)
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
✔ Linking successful
/usr/local/google/home/emersonm/.local/lib/python3.9/site-packages/en_core_web_sm
-->
/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/data/en
You can now load the model via spacy.load('en')
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__pandas/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'thread_label'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 186, in <module>
    corpus = convo_politeness.prepare_corpus(comments)
  File "/usr/local/google/home/emersonm/toxicity-detector/src/convo_politeness.py", line 61, in prepare_corpus
    "thread_label": row["thread_label"],
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__pandas/pandas/core/series.py", line 853, in __getitem__
    return self._get_value(key)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__pandas/pandas/core/series.py", line 961, in _get_value
    loc = self.index.get_loc(label)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__pandas/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: 'thread_label'

sophieball commented 3 years ago

@CaptainEmerson Can you try the current code? I can't get it pass travis although it runs fine on my machine. The errors are:

g++: error: unrecognized command line option '-fmacro-prefix-map=/home/travis/.cache/bazel/_bazel_travis/b780598bdbf255f7df0b39f918df3247/sandbox/linux-sandbox/6/execroot/__main__/='
g++: error: unrecognized command line option '-ffile-prefix-map=/home/travis/.cache/bazel/_bazel_travis/b780598bdbf255f7df0b39f918df3247/sandbox/linux-sandbox/6/execroot/__main__/='
/opt/R/4.0.2/lib/R/etc/Makeconf:176: recipe for target 'api.o' failed
make: *** [api.o] Error 1
ERROR: compilation failed for package 'Rcpp'
* removing '/tmp/bazel/R/lib/external/R_Rcpp/Rcpp'*

But I don't know why I'm getting this error... I updated it's R version and everything...

CaptainEmerson commented 3 years ago

Both targets build fine. When running either, I get the same error:

Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/main/train_classifier_g.py", line 12, in <module>
    from src import suite
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/src/suite.py", line 21, in <module>
    from src import create_features
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/src/create_features.py", line 19, in <module>
    from src import util
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/src/util.py", line 4, in <module>
    from src import text_modifier
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/src/text_modifier.py", line 12, in <module>
    nlp = spacy.load("en_core_web_sm", disable=["parser", "ner"])
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/__init__.py", line 30, in load
    return util.load_model(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/util.py", line 170, in load_model
    return load_model_from_package(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/util.py", line 191, in load_model_from_package
    return cls.load(**overrides)
  File "/usr/local/google/home/emersonm/.local/lib/python3.9/site-packages/en_core_web_sm/__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/util.py", line 235, in load_model_from_init_py
    return load_model_from_path(data_path, meta, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/util.py", line 216, in load_model_from_path
    component = nlp.create_pipe(factory, config=config)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/language.py", line 309, in create_pipe
    return factory(self, **config)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/language.py", line 1080, in factory
    return obj.from_nlp(nlp, **cfg)
  File "pipes.pyx", line 62, in spacy.pipeline.pipes.Pipe.from_nlp
TypeError: type() takes 1 or 3 arguments

sophieball commented 3 years ago

gcc: error: unrecognized command line option '-fmacro-prefix-map=/home/travis/.cache/bazel/_bazel_travis/b780598bdbf255f7df0b39f918df3247/sandbox/linux-sandbox/5/execroot/__main__/='
gcc: error: unrecognized command line option '-ffile-prefix-map=/home/travis/.cache/bazel/_bazel_travis/b780598bdbf255f7df0b39f918df3247/sandbox/linux-sandbox/5/execroot/__main__/='
/opt/R/4.0.2/lib/R/etc/Makeconf:167: recipe for target 'pipe.o' failed
make: *** [pipe.o] Error 1
ERROR: compilation failed for package 'magrittr'
* removing '/tmp/bazel/R/lib/external/R_magrittr/magrittr'

Seems like travis has some problem with installing R packages...

@CaptainEmerson Can you try the new push again? I think the problems you encountered are related to spacy.. I tried to remove all of them for now...

CaptainEmerson commented 3 years ago

> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
⚠ Skipping model package dependencies and setting `--no-deps`. You
don't seem to have the spaCy package itself installed (maybe because you've
built from source?), so installing the model dependencies would cause spaCy to
be downloaded, which probably isn't what you want. If the model package has
other dependencies, you'll have to install them manually.
Requirement already satisfied: en_core_web_sm==2.3.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.0/en_core_web_sm-2.3.0.tar.gz#egg=en_core_web_sm==2.3.0 in /usr/local/google/home/emersonm/.local/lib/python3.9/site-packages (2.3.0)
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
✔ Linking successful
/usr/local/google/home/emersonm/.local/lib/python3.9/site-packages/en_core_web_sm
-->
/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/data/en
You can now load the model via spacy.load('en')
raw output are stored in the bazel binary's runfiles folder with the name `fighting_words_freq.csv`.

sorted by ngram version is stored in the bazel binary's runfiles folder with the name `fighting_words_sorted.csv`.

Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 190, in <module>
    politeness_hist(corpus)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 116, in politeness_hist
    parser = TextParser(verbosity=0)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__convokit/convokit/text_processing/textParser.py", line 52, in __init__
    aux_input['spacy_nlp'] = spacy.load('en_core_web_sm', disable=['ner'])
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/__init__.py", line 30, in load
    return util.load_model(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 170, in load_model
    return load_model_from_package(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 191, in load_model_from_package
    return cls.load(**overrides)
  File "/usr/local/google/home/emersonm/.local/lib/python3.9/site-packages/en_core_web_sm/__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 235, in load_model_from_init_py
    return load_model_from_path(data_path, meta, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/util.py", line 216, in load_model_from_path
    component = nlp.create_pipe(factory, config=config)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/language.py", line 309, in create_pipe
    return factory(self, **config)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/language.py", line 1080, in factory
    return obj.from_nlp(nlp, **cfg)
  File "pipes.pyx", line 62, in spacy.pipeline.pipes.Pipe.from_nlp
TypeError: type() takes 1 or 3 arguments
>

sophieball commented 3 years ago

The problem happens inside spacy but I realize I can't get rid of all spacy because we need to get politeness scores.... I think we tried setting it to a specific version.. do you remember why it didn't work?

CaptainEmerson commented 3 years ago

It seems like we're working on multiple problems at the same time (e.g. travis, running on my Google machine, and something else?). I'm having trouble keeping them straight. Should we try a different debugging strategy?

I don't recall what the issue was with spacy versions.

The problem happens inside spacy but I realize I can't get rid of all spacy because we need to get politeness scores

I'm sure that's true for some/most targets, but does convo_word_freq_diff need spacy? That one seems like the most basic, so I've been trying to get that to run first.

As for running on my machine, it seems like I've got issues that aren't related to your changes:

emersonm@emersonm:~/toxicity-detector$ bazel run src/convo_word_freq_diff
INFO: Analyzed target //src:convo_word_freq_diff (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //src:convo_word_freq_diff up-to-date:
  bazel-bin/src/convo_word_freq_diff
INFO: Elapsed time: 0.204s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
/usr/bin/env: ‘python’: No such file or directory

I know this isn't how I'm supposed to be running it, but I get the same error invoking it the right way. I get the same runtime error on HEAD.

sophieball commented 3 years ago

Right.. there are 2 big issues.. Travis is suddenly upset with my R packages.. and we can't get the code running on your Google machine.. and I'm adding things to the code in the meanwhile.... I'll make new changes in another branch and try to get the code running on your machine on this branch

spacy is needed by politeness scores. I removed it from convo_word_freq_diff. Right now convo_word_freq_diff only prints out fighting words, their z-scores, and the log-odds ratio plot; I removed the histogram of politeness strategy counts at the moment.

CaptainEmerson commented 3 years ago

Two problems:

> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
sys:1: DtypeWarning: Columns (9) have mixed types.Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 138, in <module>
    corpus = prepare_corpus(comments)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 39, in prepare_corpus
    if "author" in comments.columns and row["author"] in bots:
NameError: name 'bots' is not defined
>

emersonm@emersonm:~/toxicity-detector$ bazel build main/train_classifier_g
ERROR: /usr/local/google/home/emersonm/toxicity-detector/src/BUILD:19:11: no such package 'src/senti_core/utils': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package.
 - /usr/local/google/home/emersonm/toxicity-detector/src/senti_core/utils and referenced by '//src:sentiment_classification'
ERROR: /usr/local/google/home/emersonm/toxicity-detector/src/BUILD:19:11: no such package 'src/senti_core/utils': BUILD file not found in any of the following directories. Add a BUILD file to a directory to mark it as a package.
 - /usr/local/google/home/emersonm/toxicity-detector/src/senti_core/utils and referenced by '//src:sentiment_classification'
ERROR: Analysis of target '//main:train_classifier_g' failed; build aborted: Analysis failed
INFO: Elapsed time: 0.605s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (6 packages loaded, 112 targets configured)

sophieball commented 3 years ago

@CaptainEmerson Hi Emerson, same problem as before, travis is having some problems with some R packages.. can you try to run the new code?

CaptainEmerson commented 3 years ago

Working:

> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
sys:1: DtypeWarning: Columns (9) have mixed types.Specify dtype option on import or set low_memory=False.
raw output are stored in the bazel binary's runfiles folder with the name `fighting_words_freq.csv`.

sorted by ngram version is stored in the bazel binary's runfiles folder with the name `fighting_words_sorted.csv`.

>

Working:

emersonm@emersonm:~/toxicity-detector$ bazel build main/train_classifier_g
INFO: Analyzed target //main:train_classifier_g (6 packages loaded, 562 targets configured).
INFO: Found 1 target...
Target //main:train_classifier_g up-to-date:
  bazel-bin/main/train_classifier_g
INFO: Elapsed time: 6.949s, Critical Path: 0.73s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
emersonm@emersonm:~/toxicity-detector$

Not working:

> system2("bazel-bin/main/train_classifier_g",
+         input = format_csv(df[is.finite(review_time) &
+                               is.finite(shepherd_time) &
+                               is.finite(rounds)]))
⚠ Skipping model package dependencies and setting `--no-deps`. You
don't seem to have the spaCy package itself installed (maybe because you've
built from source?), so installing the model dependencies would cause spaCy to
be downloaded, which probably isn't what you want. If the model package has
other dependencies, you'll have to install them manually.
Collecting en_core_web_sm==2.3.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.0/en_core_web_sm-2.3.0.tar.gz (12.0 MB)
     |████████████████████████████████| 12.0 MB 7.4 MB/s 
✔ Download and installation successful
You can now load the model via spacy.load('en_core_web_sm')
✔ Linking successful
/usr/local/google/home/emersonm/.local/lib/python3.9/site-packages/en_core_web_sm
-->
/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/data/en
You can now load the model via spacy.load('en')
/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__gensim/gensim/similarities/__init__.py:15: UserWarning: The gensim.similarities.levenshtein submodule is disabled, because the optional Levenshtein package <https://pypi.org/project/python-Levenshtein/> is unavailable. Install Levenhstein (e.g. `pip install python-Levenshtein`) to suppress this warning.
  warnings.warn(msg)
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/main/train_classifier_g.py", line 12, in <module>
    from src import suite
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/src/suite.py", line 20, in <module>
    from src import convo_politeness
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/src/convo_politeness.py", line 5, in <module>
    download_data.download_data()
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/src/download_data.py", line 31, in download_data
    spacy.load("en")
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/__init__.py", line 30, in load
    return util.load_model(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/util.py", line 168, in load_model
    return load_model_from_link(name, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/util.py", line 185, in load_model_from_link
    return cls.load(**overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/data/en/__init__.py", line 12, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/util.py", line 235, in load_model_from_init_py
    return load_model_from_path(data_path, meta, **overrides)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/util.py", line 218, in load_model_from_path
    return nlp.from_disk(model_path, exclude=disable)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/language.py", line 971, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/util.py", line 686, in from_disk
    reader(path / key)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/language.py", line 948, in deserialize_vocab
    _fix_pretrained_vectors_name(self)
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/language.py", line 1106, in _fix_pretrained_vectors_name
    proc.cfg.setdefault("deprecation_fixes", {})
AttributeError: 'getset_descriptor' object has no attribute 'setdefault'
>

sophieball commented 3 years ago

I think the problem is noted here: https://stackoverflow.com/questions/67192945/nltk-corpus-getset-descriptor-object-has-no-attribute-setdefault

I changed the spacy version

CaptainEmerson commented 3 years ago

> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 9, in <module>
    from convokit import Corpus, Speaker, Utterance
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__convokit/convokit/__init__.py", line 4, in <module>
    from .politenessStrategies import *
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__convokit/convokit/politenessStrategies/__init__.py", line 1, in <module>
    from .politenessStrategies import *
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__convokit/convokit/politenessStrategies/politenessStrategies.py", line 5, in <module>
    from convokit.text_processing.textParser import process_text
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__convokit/convokit/text_processing/__init__.py", line 2, in <module>
    from .textParser import *
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__convokit/convokit/text_processing/textParser.py", line 2, in <module>
    import spacy
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/__init__.py", line 10, in <module>
    from thinc.api import prefer_gpu, require_gpu, require_cpu  # noqa: F401
ImportError: cannot import name 'prefer_gpu' from 'thinc.api' (/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__thinc/thinc/api.py)
> system2("bazel-bin/main/train_classifier_g",
+         input = format_csv(df[is.finite(review_time) &
+                               is.finite(shepherd_time) &
+                               is.finite(rounds)]))
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/main/train_classifier_g.py", line 2, in <module>
    from src import download_data
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/__main__/src/download_data.py", line 6, in <module>
    import spacy
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__spacy/spacy/__init__.py", line 10, in <module>
    from thinc.api import prefer_gpu, require_gpu, require_cpu  # noqa: F401
ImportError: cannot import name 'prefer_gpu' from 'thinc.api' (/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/main/train_classifier_g.runfiles/deps/pypi__thinc/thinc/api.py)
>

I tried removing the thinc version, but then I get:

> system2("bazel-bin/src/convo_word_freq_diff", input = format_csv(df))
Traceback (most recent call last):
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/__main__/src/convo_word_freq_diff.py", line 9, in <module>
    from convokit import Corpus, Speaker, Utterance
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__convokit/convokit/__init__.py", line 4, in <module>
    from .politenessStrategies import *
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__convokit/convokit/politenessStrategies/__init__.py", line 1, in <module>
    from .politenessStrategies import *
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__convokit/convokit/politenessStrategies/politenessStrategies.py", line 5, in <module>
    from convokit.text_processing.textParser import process_text
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__convokit/convokit/text_processing/__init__.py", line 2, in <module>
    from .textParser import *
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__convokit/convokit/text_processing/textParser.py", line 2, in <module>
    import spacy
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/__init__.py", line 14, in <module>
    from .cli.info import info  # noqa: F401
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/cli/__init__.py", line 3, in <module>
    from ._util import app, setup_cli  # noqa: F401
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__spacy/spacy/cli/_util.py", line 8, in <module>
    import typer
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__typer/typer/__init__.py", line 29, in <module>
    from .main import Typer as Typer
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__typer/typer/main.py", line 11, in <module>
    from .completion import get_completion_inspect_parameters
  File "/usr/local/google/home/emersonm/toxicity-detector/bazel-bin/src/convo_word_freq_diff.runfiles/deps/pypi__typer/typer/completion.py", line 10, in <module>
    import click._bashcomplete
ModuleNotFoundError: No module named 'click._bashcomplete'

sophieball commented 3 years ago

I saw some posts around May with the same problem. I added click to requirements.txt. But some people say they still have the problem even after they installed click

CaptainEmerson commented 3 years ago

I get the same error. I think spacy needs a specific click version, because 7.x has _batchcomplete.py and 8.x doesn't.

sophieball commented 3 years ago

OH! Interesting! Yeah I also had the same error. I added click<8.0.0. Builds on my side..

CaptainEmerson commented 3 years ago

Both train_classifier_g and convo_word_freq_diff work now. Do you want to see any of the output?

sophieball commented 3 years ago

Cool! Can you share the train_classifier.log?

sophieball commented 3 years ago

I have a new build error from Travis:


ERROR: No matching distribution found for spacy==3.0.0 (from -r /home/travis/build/sophieball/toxicity-detector/requirements.txt (line 36))```

But 2.3.7 doesn't work with `thinc`. @CaptainEmerson  Do you have this error?

CaptainEmerson commented 3 years ago

I don't have that error. Both targets build and run. I've shared the output just now.