wengong-jin / hgraph2graph

Hierarchical Generation of Molecular Graphs using Structural Motifs
MIT License
367 stars 108 forks source link

Getting error while generating vocabulary #14

Open aliraza-ece opened 4 years ago

aliraza-ece commented 4 years ago

Hello Wengong !

Thanks for the great work !!

I am trying to get vocabulary using your dataset < ../data/polymers/all.txt > ; however, I am getting this error. I cannot figure this out. At the end I tried try-exception there but there are lots of these errors in the whole run. I will appreciate if you could assist me.

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "[...]\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "[...]\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "[...]\hgraph2graph-master\hgraph2graph-master\generation\get_vocab.py", line 12, in process
    hmol = MolGraph(s)
  File "[...]\hgraph2graph-master\hgraph2graph-master\generation\poly_hgraph\mol_graph.py", line 29, in __init__
    self.clusters, self.atom_cls = self.pool_clusters()
  File "[...]\hgraph2graph-master\hgraph2graph-master\generation\poly_hgraph\mol_graph.py", line 87, in pool_clusters
    **if fsmiles not in MolGraph.FRAGMENTS: continue**
TypeError: argument of type 'NoneType' is not iterable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "[...]/hgraph2graph-master/generation/get_vocab.py", line 62, in <module>
    vocab_list = pool.map(process, batches) # getting error here TypeError: argument of type 'NoneType' is not iterable
  File "[...]\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "[...]\Anaconda3\envs\myenv\lib\multiprocessing\pool.py", line 644, in get
    raise self._value
TypeError: argument of type 'NoneType' is not iterable
wengong-jin commented 4 years ago

Hi,

It seems that MolGraph.FRAGMENTS is not initialized. In https://github.com/wengong-jin/hgraph2graph/blob/3249f93c6e72a3cdfb0c0a71939b0f071dfe7456/generation/get_vocab.py#L50, the load_fragment function will set MolGraph.FRAGMENTS to a list of fragments collected from your training data.

This is strange because as long as load_fragment is called (get_vocab.py Line 50), MolGraph.FRAGMENTS cannot be None (at best it's an empty list). I think the error happened before load_fragment function is called. You can try to print out fragments variable in line 49 to see whether it gets executed or not.

nikhilmittal444 commented 4 years ago

I too was getting the same error in generation/preprocessing.py file. I debugged the code step by step and found the issue that when the program calls partial(tensorize, mol_batches), the MolGraph.tensorize initializes the FRAGMENTS to None and never calls load_fragments before going to pool_clusters() leading to this NoneType iterable issue. Please help with this if I am wrong. Thank you in advance

aliraza-ece commented 4 years ago

@nikhilmittal444 This is an issue with Pool in Windows. MolGraph.FRAGMENTS is not accessible in functions called through Pool. I removed the multiprocessing and I am able to get the vocabulary without any issue. However, I am only getting 2273 lines in contrast to 2288 lines in the provided vocab. @wengong-jin I am still going through the code to see if there is any randomness. However, do you think this is normal?

nikhilmittal444 commented 4 years ago

I made the FRAGMENTS from load_fragments as a new variable and put that as input argument to the tensorize function(self.new_variable) and the MolGraph object in the init(), which gave me 2288 lines as initialized. It also resolved the MolGraph.FRAGMENTS not iterable as NoneType object

mateuszrezler commented 4 years ago

Hi guys,

this issue could be easily solved by simple replacement of None with an empty list (see #15). @wengong-jin, please review if this change seems to be safe.

orubaba commented 2 years ago

Hi gurus, please image , I need your help. I am trying to run the get-vocab.py on my small dataset around 100. but keep getting this error as shown below: Is there a way to go around this. the reference for the error is to the mol_graph.py line82: "assert n - m <= 1 #must be connected" image