libAtoms / abcd

1 stars 4 forks source link

problem with formula regexp query #52

Closed gabor1 closed 4 years ago

gabor1 commented 4 years ago
docker:~$ abcd summary -q volker -q formula~".*H.*" -p formula

info.formula count: 1024 unique: 95
                                          21 C216H6
                                          20 C216H35
                                          20 C216H29
                                          18 C216H32
                                          18 C216H30
                                          18 C216H2
                                          18 C216H16
                                          17 C216H36
                                          17 C216H34
                                          16 C216H55
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 841 ...
docker:~$ 
docker:~$ abcd summary -q volker -q formula~"C.*H.*" -p formula
Traceback (most recent call last):
  File "/usr/local/bin/abcd", line 11, in <module>
    load_entry_point('abcd==0.4', 'console_scripts', 'abcd')()
  File "/usr/local/lib/python3.5/dist-packages/abcd-0.4-py3.5.egg/abcd/frontends/shell/__init__.py", line 17, in cli
  File "/usr/local/lib/python3.5/dist-packages/abcd-0.4-py3.5.egg/abcd/frontends/shell/__init__.py", line 108, in __call__
  File "/usr/local/lib/python3.5/dist-packages/abcd-0.4-py3.5.egg/abcd/frontends/shell/__init__.py", line 317, in summary
  File "/usr/local/lib/python3.5/dist-packages/abcd-0.4-py3.5.egg/abcd/backends/atoms_mongoengine.py", line 567, in hist
  File "/usr/local/lib/python3.5/dist-packages/abcd-0.4-py3.5.egg/abcd/backends/atoms_mongoengine.py", line 471, in property
  File "/usr/local/lib/python3.5/dist-packages/mongoengine-0.18.2-py3.5.egg/mongoengine/queryset/base.py", line 1212, in aggregate
  File "/usr/local/lib/python3.5/dist-packages/pymongo-3.8.0-py3.5-linux-x86_64.egg/pymongo/collection.py", line 2411, in aggregate
    **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/pymongo-3.8.0-py3.5-linux-x86_64.egg/pymongo/collection.py", line 2318, in _aggregate
    user_fields={'cursor': {'firstBatch': 1}})
  File "/usr/local/lib/python3.5/dist-packages/pymongo-3.8.0-py3.5-linux-x86_64.egg/pymongo/pool.py", line 584, in command
    user_fields=user_fields)
  File "/usr/local/lib/python3.5/dist-packages/pymongo-3.8.0-py3.5-linux-x86_64.egg/pymongo/network.py", line 158, in command
    parse_write_concern_error=parse_write_concern_error)
  File "/usr/local/lib/python3.5/dist-packages/pymongo-3.8.0-py3.5-linux-x86_64.egg/pymongo/helpers.py", line 155, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: $or/$and/$nor entries need to be full objects
docker:~$ 
fekad commented 4 years ago

I suspect the parser of the query string is not capable to handle special characters (like '') properly and ignores them. In the first case, it is ok because ".*H.\" actually equivalent to "H".

gabor1 commented 4 years ago

.* Is a very specific regexp thing. You should be just passing the entire things in the quotes to the regexp.

-- Gábor

On 15 Jul 2019, at 17:54, Adam Fekete notifications@github.com wrote:

I suspect the parser of the query string is not capable to handle special characters (like '*') properly and ignores them. In the first case, it is ok because ".H." actually equivalent to "H".

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

fekad commented 4 years ago

I updated the parser's regexp to parse regex expressions. This is just a quick workaround, not the long term solution.

gabor1 commented 4 years ago

I thought you just pass the entire string to dinner python regexp function. Why is this not the final solution?

-- Gábor

On 15 Jul 2019, at 21:09, Adam Fekete notifications@github.com wrote:

I updated the parser's regexp to parse regex expressions. This is just a quick workaround, not the long term solution.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

fekad commented 4 years ago

The query should support complex expressions, like 'and' or 'or', so the following commands should be equivalent to each other:

abcd summary -q volker -q formula~".*H.*" 
abcd summary -q "volker formula~.*H.*" 
abcd summary -q "volker and formula~.*H.*" 

To parse all the possible query options I have a 'lexer' a 'parser' and 'compiler' which produce an abstract syntaxt tree (AST) at the end. Ideally all the three variation above should have the same AST.

Finally if there any regex in the AST I pass it to the mongo database which will use it to filter the data.

It is a temporary solution because solving the issue #6 and #25 will provide a more stable solution.

gabor1 commented 4 years ago

Decision: we decided to require all strings, including regexps to be quoted, that will help resolve all of this.