ValueError: Input 0 of layer sequential is incompatible with the layer

domeniconappo commented 4 years ago

Hi, updating mordecai to 2.1.0 and dependencies: tensorflow to 2.3.0 spacy to 2.3.2 keras to 2.4.3

Our geocoding processing now is much slower as we've started to observe lots of errors printing to console like the following:

ValueError: Input 0 of layer sequential is incompatible with the layer: expected axis -1 of input shape to have value 12 but received input with shape [None, 0]

It's not clear how this is influencing geocoding but for sure it's much slower as our queues are constantly building up and accumulating documents to be geoparsed.

Can you help? Is it a problem with deps versions?

Thank you in advance and for your great work!

ahalterman commented 4 years ago

Huh, that's frustrating. I really didn't change that much beyond bumping the versions, so I'm not sure where the slowdown is coming from. Do you have a document that produces the ValueError that you can share?

marcusvrlopes commented 4 years ago

I just start with mordecai last week, but i got the same problem described by @domeniconappo . After a lot of tests changing versions, trying to use cuda etc... nothing changed. Then i gave a try on jupyter notebook. I don't know why, but analysis became a lot faster. The only lib version that differs from @domeniconappo and my own old script is tensorflow (1.14.0 installed by conda)

vupadhyaya19 commented 3 years ago

Hi @ahalterman, even I am getting the same issue while using the package. The issue is occurring due to the identification of some irrelevant terms as geo terms in my case. After the code lookup, I found out that in geoparse.py in line# 731 while we call this: prediction = self.country_model.predict(i['matrix']).transpose()[0] the matrix for the word generated is empty and of shape (1,0). So let me know if we can filter out the below code based on the empty matrix(in line# 722 geoparse.py): feat = self.make_country_matrix(loc).

Example of the geo-terms identified which are causing the issue:

{'labels': [], 'matrix': matrix([], shape=(1, 0), dtype=float64), 'word': 'organomercury'}

{'labels': [], 'matrix': matrix([], shape=(1, 0), dtype=float64), 'word': 'orangeiron'}

{'labels': [], 'matrix': matrix([], shape=(1, 0), dtype=float64), 'word': 'redoxygen'}

{'labels': [], 'matrix': matrix([], shape=(1, 0), dtype=float64), 'word': 'FeC10(HgCl)10'}

[{'text': 'organomercury', 'label': '', 'word': 'organomercury', 'spans': [{'start': 900, 'end': 913}], 'features': {'maj_vote': '', 'word_vec': '', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': '0', 'class_mention': '', 'code_mention': ''}}, {'text': 'Pbca', 'label': '', 'word': 'Pbca', 'spans': [{'start': 4644, 'end': 4648}], 'features': {'maj_vote': '', 'word_vec': '', 'first_back': 'POL', 'most_alt': 'CHN', 'most_pop': 'MEX', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': '0', 'class_mention': '', 'code_mention': ''}}, {'text': 'orangeiron', 'label': '', 'word': 'orangeiron', 'spans': [{'start': 6157, 'end': 6167}], 'features': {'maj_vote': '', 'word_vec': '', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': '0', 'class_mention': '', 'code_mention': ''}}, {'text': 'redoxygen', 'label': '', 'word': 'redoxygen', 'spans': [{'start': 6184, 'end': 6193}], 'features': {'maj_vote': '', 'word_vec': '', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': '0', 'class_mention': '', 'code_mention': ''}}, {'text': 'metallocene moiety', 'label': '', 'word': 'metallocene moiety', 'spans': [{'start': 6935, 'end': 6953}], 'features': {'maj_vote': '', 'word_vec': 'GNQ', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': 4.130288124084473, 'class_mention': '', 'code_mention': ''}}, {'text': '3.447(1)Å (Figure1C', 'label': '', 'word': '3.447(1)Å (Figure1C', 'spans': [{'start': 7585, 'end': 7604}], 'features': {'maj_vote': '', 'word_vec': 'TUR', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': 1.3494553565979004, 'class_mention': '', 'code_mention': ''}}, {'text': 'FeC10(HgCl)10', 'label': '', 'word': 'FeC10(HgCl)10', 'spans': [{'start': 12695, 'end': 12708}], 'features': {'maj_vote': '', 'word_vec': '', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': '0', 'class_mention': '', 'code_mention': ''}}, {'text': 'Deutsche Forschungsgemeinschaft', 'label': '', 'word': 'Deutsche Forschungsgemeinschaft', 'spans': [{'start': 13577, 'end': 13608}], 'features': {'maj_vote': '', 'word_vec': 'DEU', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': 10.370280265808105, 'class_mention': '', 'code_mention': ''}}, {'text': 'ZEDAT/FU Berlin', 'label': '', 'word': 'ZEDAT/FU Berlin', 'spans': [{'start': 13713, 'end': 13728}], 'features': {'maj_vote': '', 'word_vec': 'DEU', 'first_back': '', 'most_alt': '', 'most_pop': '', 'ct_mention': '', 'ctm_count1': 0, 'ct_mention2': '', 'ctm_count2': 0, 'wv_confid': 11.895607948303223, 'class_mention': '', 'code_mention': ''}}]

vupadhyaya19 commented 3 years ago

Hi @ahalterman, I did the changes in geoparse.py and the issue is not occurring now. Let me know if the below code changes can be committed and pushed. geoparse.txt

ahalterman commented 3 years ago

@vupadhyaya19: can you open a pull request with your changes?

I'm hoping to make v3 public in July and that should resolve the issue because it switches from TF to pytorch, but I'd like to leave this version in a usable form for people who might stick with it.

luizavladislavna commented 3 years ago

@vupadhyaya19: can you open a pull request with your changes?

I'm hoping to make v3 public in July and that should resolve the issue because it switches from TF to pytorch, but I'd like to leave this version in a usable form for people who might stick with it.

Hi, @ahalterman ! First of all, thank you for your job!

Looks like I have same issue described above, so:

Can you please update us with v3? Any chance that you will share it with community?

openeventdata / mordecai

ValueError: Input 0 of layer sequential is incompatible with the layer #85