obi-ml-public / ehr_deidentification

Robust de-identification of medical notes using transformer architectures
MIT License
46 stars 13 forks source link

Installation issue #3

Closed matthewchung74 closed 2 years ago

matthewchung74 commented 2 years ago

Thank you for this library. I'm having some installation issues. Do you most commonly run using pip on a gpu or cpu? Do you usually use Conda or pip and on ubuntu? I would like to try to install the same way you usually do.

prajwal967 commented 2 years ago

We worked on a Linux machine with NVIDIA gpu's using a conda environment. Could you describe the issue you had? We might have faced the same issue or we can try and figure out if we need to make any changes. Thanks!

matthewchung74 commented 2 years ago

I actually tried getting to run on sagemaker using pip on a cpu and gpu but the pip dependencies did not resolve. on my Mac (M1) I did get it working using

name: torch-nightly
channels:
  - conda-forge
  - defaults
dependencies:
  - appnope=0.1.3=pyhd8ed1ab_0
  - asttokens=2.0.5=pyhd8ed1ab_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=py_2
  - backports.functools_lru_cache=1.6.4=pyhd8ed1ab_0
  - ca-certificates=2022.6.15=h033912b_0
  - certifi=2022.6.15=py38h50d1736_0
  - debugpy=1.5.1=py38he9d5cce_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - entrypoints=0.4=pyhd8ed1ab_0
  - executing=0.8.3=pyhd8ed1ab_0
  - ipykernel=6.15.0=pyh736e0ef_0
  - ipython=8.4.0=py38h50d1736_0
  - jedi=0.18.1=py38h50d1736_1
  - jupyter_client=7.0.6=pyhd8ed1ab_0
  - jupyter_core=4.10.0=py38h50d1736_0
  - libcxx=12.0.0=h2f01273_0
  - libffi=3.3=hb1e8313_2
  - libsodium=1.0.18=hbcb3906_1
  - matplotlib-inline=0.1.3=pyhd8ed1ab_0
  - ncurses=6.3=hca72f7f_2
  - nest-asyncio=1.5.5=pyhd8ed1ab_0
  - openssl=1.1.1o=hfe4f2af_0
  - packaging=21.3=pyhd8ed1ab_0
  - parso=0.8.3=pyhd8ed1ab_0
  - pexpect=4.8.0=pyh9f0ad1d_2
  - pickleshare=0.7.5=py_1003
  - pip=21.2.4=py38hecd8cb5_0
  - prompt-toolkit=3.0.29=pyha770c72_0
  - psutil=5.9.1=py38h0dd4459_0
  - ptyprocess=0.7.0=pyhd3deb0d_0
  - pure_eval=0.2.2=pyhd8ed1ab_0
  - pygments=2.12.0=pyhd8ed1ab_0
  - pyparsing=3.0.9=pyhd8ed1ab_0
  - python=3.8.13=hdfd78df_0
  - python-dateutil=2.8.2=pyhd8ed1ab_0
  - python_abi=3.8=2_cp38
  - pyzmq=19.0.2=py38h2c785a9_2
  - readline=8.1.2=hca72f7f_1
  - setuptools=61.2.0=py38hecd8cb5_0
  - six=1.16.0=pyh6c4a22f_0
  - sqlite=3.38.3=h707629a_0
  - stack_data=0.3.0=pyhd8ed1ab_0
  - tk=8.6.12=h5d9f67b_0
  - tornado=6.1=py38hed1de0f_3
  - traitlets=5.3.0=pyhd8ed1ab_0
  - wcwidth=0.2.5=pyh9f0ad1d_2
  - wheel=0.37.1=pyhd3eb1b0_0
  - xz=5.2.5=hca72f7f_1
  - zeromq=4.3.4=he49afe7_1
  - zlib=1.2.12=h4dc903c_2
  - pip:
    - aiohttp==3.8.1
    - aiosignal==1.2.0
    - allennlp==2.9.3
    - argon2-cffi==21.3.0
    - argon2-cffi-bindings==21.2.0
    - async-timeout==4.0.2
    - attrs==21.4.0
    - base58==2.1.1
    - beautifulsoup4==4.11.1
    - bleach==5.0.0
    - blis==0.7.7
    - boto3==1.24.13
    - botocore==1.27.13
    - cached-path==1.1.3
    - cachetools==5.2.0
    - catalogue==2.0.7
    - cffi==1.15.0
    - charset-normalizer==2.0.12
    - click==7.1.2
    - commonmark==0.9.1
    - cymem==2.0.6
    - datasets==2.3.2
    - defusedxml==0.7.1
    - dill==0.3.5.1
    - docker-pycreds==0.4.0
    - en-core-sci-lg==0.4.0
    - en-core-sci-sm==0.4.0
    - fairscale==0.4.6
    - fastjsonschema==2.15.3
    - filelock==3.6.0
    - frozenlist==1.3.0
    - fsspec==2022.5.0
    - gitdb==4.0.9
    - gitpython==3.1.27
    - google-api-core==2.8.2
    - google-auth==2.8.0
    - google-cloud-core==2.3.1
    - google-cloud-storage==2.4.0
    - google-crc32c==1.3.0
    - google-resumable-media==2.3.3
    - googleapis-common-protos==1.56.2
    - h5py==3.7.0
    - huggingface-hub==0.7.0
    - idna==3.3
    - importlib-resources==5.8.0
    - iniconfig==1.1.1
    - ipython-genutils==0.2.0
    - ipywidgets==7.7.0
    - jinja2==3.1.2
    - jmespath==1.0.1
    - joblib==1.1.0
    - jsonnet==0.18.0
    - jsonschema==4.6.0
    - jupyter==1.0.0
    - jupyter-console==6.4.3
    - jupyterlab-pygments==0.2.2
    - jupyterlab-widgets==1.1.0
    - langcodes==3.3.0
    - lmdb==1.3.0
    - markupsafe==2.1.1
    - mistune==0.8.4
    - more-itertools==8.13.0
    - multidict==6.0.2
    - multiprocess==0.70.13
    - murmurhash==1.0.7
    - nbclient==0.6.4
    - nbconvert==6.5.0
    - nbformat==5.4.0
    - nltk==3.7
    - notebook==6.4.12
    - numpy==1.23.0rc3
    - pandas==1.4.2
    - pandocfilters==1.5.0
    - pathtools==0.1.2
    - pathy==0.6.1
    - pillow==9.1.1
    - pluggy==1.0.0
    - preshed==3.0.6
    - prometheus-client==0.14.1
    - promise==2.3
    - protobuf==3.20.1
    - py==1.11.0
    - pyarrow==8.0.0
    - pyasn1==0.4.8
    - pyasn1-modules==0.2.8
    - pycorenlp==0.3.0
    - pycparser==2.21
    - pydantic==1.8.2
    - pyrsistent==0.18.1
    - pytest==7.1.2
    - pytz==2022.1
    - pyyaml==6.0
    - qtconsole==5.3.1
    - qtpy==2.1.0
    - regex==2022.6.2
    - requests==2.28.0
    - responses==0.18.0
    - rich==12.4.4
    - robust-deid==0.1.1
    - rsa==4.8
    - s3transfer==0.6.0
    - sacremoses==0.0.53
    - scikit-learn==1.1.1
    - scipy==1.8.1
    - send2trash==1.8.0
    - sentencepiece==0.1.96
    - sentry-sdk==1.5.12
    - seqeval==1.2.2
    - setproctitle==1.2.3
    - shortuuid==1.0.9
    - smart-open==5.2.1
    - smmap==5.0.0
    - soupsieve==2.3.2.post1
    - spacy==3.0.8
    - spacy-legacy==3.0.9
    - spacy-loggers==1.0.2
    - srsly==2.4.3
    - tensorboardx==2.5.1
    - termcolor==1.1.0
    - terminado==0.15.0
    - thinc==8.0.17
    - threadpoolctl==3.1.0
    - tinycss2==1.1.1
    - tokenizers==0.12.1
    - tomli==2.0.1
    - torch==1.11.0
    - torchaudio==0.13.0.dev20220620
    - torchvision==0.12.0
    - tqdm==4.64.0
    - transformers==4.18.0
    - typer==0.3.2
    - typing-extensions==4.2.0
    - urllib3==1.26.9
    - wandb==0.12.18
    - wasabi==0.9.1
    - webencodings==0.5.1
    - widgetsnbextension==3.6.0
    - xxhash==3.0.0
    - yarl==1.7.2
    - zipp==3.8.0
prefix: /Users/mattc/opt/anaconda3/envs/torch-nightly
matthewchung74 commented 2 years ago

I'll try again on sagemaker and re-open with my findings.