greenelab / snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊
Other
59 stars 17 forks source link

Issue when installing from conda #106

Open findalexli opened 4 years ago

findalexli commented 4 years ago

Hello!

I am having the following issue with installing the condo environment from YML file.

(base) MacBook-Pro-4:snorkeling alexanderli$ conda env create --file environment.yml
Collecting package metadata: done
Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:
  - gensim=3.8.1 -> python_abi=[build=*_cp37m] -> pypy[version='<0a0']
  - sqlalchemy=1.1.13
Use "conda search <package> --info" to see the dependencies for each package.
ajlee21 commented 4 years ago

Hi Alex, Excited to see that you're trying to use some of the work in this repository. I'm actually not the owner so I'm going to tag @danich1 who is!

danich1 commented 4 years ago

Greetings Alex, I believe your problem is an OS version issue. For linux things work fine, but not surprised MacOS is having issues. I think the quick fix for this situation is to move gensim and sqlalchemy onto the pip section and let pip handle the versioning.

Correct file (note the environment name change):

name: snorkeling
channels:
  - conda-forge
  - pytorch
dependencies:
- beautifulsoup4=4.6.0
- ipykernel=5.1.2
- ipywidgets=7.5.1
- jupyter=1.0.0
- jupyter_client=5.3.4
- jupyter_console=6.0.0
- jupyter_core=4.6.0
- llvmlite=0.21.0
- lxml==4.1.1
- matplotlib=3.1.1
- neo4j-python-driver==1.3.1
- networkx=2.1
- nltk=3.2.4
- numpy=1.17.2
- pandas=0.24.0
- pip=19.2.3
- plotnine=0.5.1
- psycopg2=2.7.3.2
- python=3.6.7
- pytorch=1.1.0
- py4j=0.10.6
- requests=2.18.4
- seaborn=0.9.0
- scikit-image=0.13.1
- scikit-learn=0.21.3
- scipy=1.3.1 
- six=1.12.0
- sqlite=3.30.0
- tensorflow==2.0.0
- tensorboard==2.0.0
- tqdm=4.28.1
- tika=1.15
- xlrd=1.1.0
- xlsxwriter=1.0.4
- pip:
    - gensim==3.8.1
    - hetio==0.2.6
    - matplotlib-venn==0.11.5
    - snorkel==0.9.1
    - spacy==1.10.0
    - sqlalchemy==1.1.13

~danich1

findalexli commented 4 years ago

Thank you, Alexandra and David. I appreciate the prompt reply.

A follow up question... the snorkel 0.9.1 does not contain a snorkel module. I am trying to run some notesbooks, like compound_disease/compound_treats_disease/dataset_statistics/dataset_statistics.ipynb, but it will not run.

ModuleNotFoundError Traceback (most recent call last)

in 8 os.environ['SNORKELDB'] = database_str 9 ---> 10 from snorkel.model import SnorkelSession 11 session = SnorkelSession() ModuleNotFoundError: No module named 'snorkel.model'
danich1 commented 4 years ago

Right. I figured that would happen. Reason for the error is that some of my notebooks was using snorkel's old version, before the authors upgraded their code. The old code was using a database to access sentences and other information and now the authors adapted their code to move away from using a database.

If you want to run the notebooks that use snorkel's old code, you will have to install this library as a separate conda environment.

Jatin6004 commented 3 years ago

Hi, I am having an issue while running the pubtator-to-postgres.ipynb file from the create_database folder.


PicklingError Traceback (most recent call last)

in 35 for edges in [dge, gge, cge, cde]: 36 print(edges) ---> 37 insert_cand_to_db(edges, [train_sens, dev_sens, test_sens]) 38 39 offset = offset + chunk_size ~\Desktop\snorkeling\create_database\database_insertion.py in insert_cand_to_db(extractor, sentences) 131 def insert_cand_to_db(extractor, sentences): 132 for split, sens in enumerate(sentences): --> 133 extractor.apply(sens, split=split, parallelism=5, clear=False) 134 135 def print_candidates(session, context_class, edge): ~\Anaconda3\envs\snorkel-extraction\lib\site-packages\snorkel\candidates.py in apply(self, xs, split, **kwargs) 216 217 def apply(self, xs, split=0, **kwargs): --> 218 super(PretaggedCandidateExtractor, self).apply(xs, split=split, **kwargs) 219 220 def clear(self, session, split, **kwargs): ~\Anaconda3\envs\snorkel-extraction\lib\site-packages\snorkel\udf.py in apply(self, xs, clear, parallelism, progress_bar, count, **kwargs) 51 self.apply_st(xs, clear=clear, count=count, **kwargs) 52 else: ---> 53 self.apply_mt(xs, parallelism, clear=clear, **kwargs) 54 55 if self.pb is not None: ~\Anaconda3\envs\snorkel-extraction\lib\site-packages\snorkel\udf.py in apply_mt(self, xs, parallelism, **kwargs) 108 # Start the UDF processes, and then join on their completion 109 for udf in self.udfs: --> 110 udf.start() 111 112 while any([udf.is_alive() for udf in self.udfs]) and count < total_count: ~\Anaconda3\envs\snorkel-extraction\lib\multiprocessing\process.py in start(self) 103 'daemonic processes are not allowed to have children' 104 _cleanup() --> 105 self._popen = self._Popen(self) 106 self._sentinel = self._popen.sentinel 107 # Avoid a refcycle if the target function holds an indirect ~\Anaconda3\envs\snorkel-extraction\lib\multiprocessing\context.py in _Popen(process_obj) 221 @staticmethod 222 def _Popen(process_obj): --> 223 return _default_context.get_context().Process._Popen(process_obj) 224 225 class DefaultContext(BaseContext): ~\Anaconda3\envs\snorkel-extraction\lib\multiprocessing\context.py in _Popen(process_obj) 320 def _Popen(process_obj): 321 from .popen_spawn_win32 import Popen --> 322 return Popen(process_obj) 323 324 class SpawnContext(BaseContext): ~\Anaconda3\envs\snorkel-extraction\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj) 63 try: 64 reduction.dump(prep_data, to_child) ---> 65 reduction.dump(process_obj, to_child) 66 finally: 67 set_spawning_popen(None) ~\Anaconda3\envs\snorkel-extraction\lib\multiprocessing\reduction.py in dump(obj, file, protocol) 58 def dump(obj, file, protocol=None): 59 '''Replacement for pickle.dump() using ForkingPickler.''' ---> 60 ForkingPickler(file, protocol).dump(obj) 61 62 # PicklingError: Can't pickle : attribute lookup DiseaseGene on snorkel.models.candidate failed
danich1 commented 3 years ago

PicklingError: Can't pickle <class 'snorkel.models.candidate.DiseaseGene'>: attribute lookup DiseaseGene on snorkel.models.candidate failed

This error arises because the pickle library cannot pickle sqlalchemy's candidate_subclass class. This is actually not an easy problem to fix. (see this post) If you want to get functionality working here you might have to edit snorkel's code directly. Turns out their old version is depreciated, so it adds to the complexity. FYI: I'm coming out with my own code to do the above extraction. Should be uploaded in a few weeks.

Jatin6004 commented 3 years ago

Thanks a lot David that would be really helpful.