libAtoms / abcd

1 stars 4 forks source link

query with negative limit (`"E_CC_MP2_2B>-0.02"`) #60

Closed eszter137 closed 4 years ago

eszter137 commented 4 years ago
$ abcd summary -q E_CC_MP2_2B -p E_CC_MP2_2B

info.E_CC_MP2_2B  count: 1507  min: -0.04438608 med: -0.0058448954 max: 0.08028567   std: 0.0073655223 var:5.4250919e-05
                                           8 [-0.04, -0.03)
▉                                         44 [-0.03, -0.02)
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉                     483 [-0.02, -0.01)
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 962 [-0.01, 0.01)
                                           3 [0.01, 0.02)
                                           2 [0.02, 0.03)
                                           2 [0.03, 0.04)
                                           0 [0.04, 0.06)
                                           2 [0.06, 0.07)
                                           1 [0.07, 0.08)
$ abcd summary -q E_CC_MP2_2B -q "E_CC_MP2_2B>-0.02" 
Total number of configurations: 0
$ abcd summary -q E_CC_MP2_2B -q "E_CC_MP2_2B<-0.02" 
Total number of configurations: 0

works with positive numbers or zero:

$ abcd summary -q E_CC_MP2_2B -q "E_CC_MP2_2B>0.01"  -p E_CC_MP2_2B

info.E_CC_MP2_2B  count: 8  min: 0.01083891 med: 0.0391562 max: 0.08028567   std: 0.021545137 var:0.00046419293
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉                     1 [0.01, 0.02)
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 2 [0.02, 0.02)
                                         0 [0.02, 0.03)
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 2 [0.03, 0.04)
                                         0 [0.04, 0.05)
                                         0 [0.05, 0.05)
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 2 [0.05, 0.06)
                                         0 [0.06, 0.07)
                                         0 [0.07, 0.07)
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉                     1 [0.07, 0.08)
$ abcd summary -q E_CC_MP2_2B -q "E_CC_MP2_2B<0.01"  -p E_CC_MP2_2B

info.E_CC_MP2_2B  count: 1499  min: -0.04438608 med: -0.0060850613 max: 0.00703563   std: 0.0064185428 var:4.1197692e-05
                                           1 [-0.04, -0.04)
                                           4 [-0.04, -0.03)
                                           5 [-0.03, -0.03)
                                          14 [-0.03, -0.02)
▉▉                                        39 [-0.02, -0.02)
▉▉▉▉▉▉▉▉                                 145 [-0.02, -0.01)
▉▉▉▉▉▉▉▉▉▉▉▉▉▉                           257 [-0.01, -0.01)
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉                       329 [-0.01, -0.00)
▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 698 [-0.00, 0.00)
                                           7 [0.00, 0.01)
fekad commented 4 years ago

Yes, I also realised that querying negative numbers is broken. It is a consequence of fixing issue #56. The parer is confused where to split string when there is a '-' sing in it because it could be part the name of could be an operator. For simplity, I would suggest preventing using '-' character in the labels.

eszter137 commented 4 years ago

I guess the </>/= are not allowed in the label names so the program could see that the other part is a number. Or is it possible to compare labels in the query? - It seems not to me:

$ abcd summary -q E_CC_MP2_2B -q "cutoff>E_CC_MP2_2B" 
Total number of configurations: 0
$ abcd summary -q E_CC_MP2_2B -q "cutoff<E_CC_MP2_2B" 
Total number of configurations: 0
gabor1 commented 4 years ago

no, we must allow - in the label names. As Eszter says, you should first look for the operator to split! there will be no <,>,= in the label names.

fekad commented 4 years ago

Ok. But in this case, we will have issues with the arithmetic functions because '-' is also an operator

n_atom-3>2

I do not any programing language which would support '-' in the name of the variables

gabor1 commented 4 years ago

Hm... you do have a point. On second thoughts, I'm OK with disallowing arithmetic expressions involving keys with - in them. Is there a clean way to have this fail with an error message, without forbidding such keys in the first place? I.e. is there a point in the code when you know you are going to parse an arithmetic expression, and can check if any of the keys include the - character?

gabor1 commented 4 years ago

The point is that this could be part of user education. I upload stuff with - in the names, then try to use them, fail with the error, and then I can use --rename to change my key naming conventions

gabor1 commented 4 years ago

This is better than throwing an error when one tries to upload such data, because then the nice abcd tools cannot be used to remedy the situation!

Alternatively, we could have a --sanitize option to upload, which enforces whatever constraints we want, e.g. replacing all - characters in names to _ (or perhaps any other disallowed character that ASE still allows)

gabor1 commented 4 years ago

Decision: for now, check the uploaded file for key names: we only allow [a-zA-Z0-9] (no '-' obviously), if file fails, we stop with error and educational message: --sanitize will change all illegal characters to "" (underscore)