scikit-beam / XrayDB

[ARCHIVED] Atomic data compiled by Elam, Ravel and Sieber. Check https://github.com/xraypy/XrayDB
9 stars 6 forks source link

funny behaviour #12

Closed jrmlhermitte closed 6 years ago

jrmlhermitte commented 7 years ago

EDIT: I mentioned earlier I may have made a syntax mistake. However, this is not the case. The issue still stands. Thanks!

There seems to be some funny behaviour with the querying for atomic number.

Here are examples of things that work and don't in a real scenario, for instance:

import xraydb
_XrayDB = xraydb.XrayDB()

a = [79]
_XrayDB.f1_chantler(a[0], 10000)

works, but

a = np.array([79],dtype=np.int64)
_XrayDB.f1_chantler(a[0], 10000)

a = np.array([79],dtype=np.int32)
_XrayDB.f1_chantler(a[0], 10000)

a = np.array([79],dtype=int)
_XrayDB.f1_chantler(a[0], 10000)

do not. I would expect these to work as well.

I can trace it back to two issues. First:

isinstance(a,int)

is false for the numpy elements. So line 472 of xraydb.py does not execute:

        if isinstance(element, int):
            elem = ElementsTable.atomic_number

However, even when I tweak the code to bypass, seems like the query doesn't work. Line 482 returns an empty list:

        row = self.query(ElementsTable).filter(elem==int(element)).all()

Any ideas? I could poke around further if no one has time, but seems like it involves a deeper level.

Also, Here's the backtrace:

In [34]: a = np.array([79],dtype=int)

In [35]: _XrayDB.f1_chantler(a[0], 10000)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-35-ee4fe020a32e> in <module>()
----> 1 _XrayDB.f1_chantler(a[0], 10000)

PATH/xraydb/xraydb.py in f1_chantler(self, element, energy, **kws)
    428             Chantler
    429         """
--> 430         return self._from_chantler(element, energy, column='f1', **kws)
    431
    432     def f2_chantler(self, element, energy, **kws):

PATH/xraydb/xraydb.py in _from_chantler(self, element, energy, column, smoothing)
    351         tab = ChantlerTable
    352         row = self.query(tab)
--> 353         row = row.filter(tab.element==self.symbol(element)).all()
    354         if len(row) > 0:
    355             row = row[0]

PATH/xraydb/xraydb.py in symbol(self, element)
    507             string: element symbol
    508         """
--> 509         return self._elem_data(element).symbol
    510
    511     def molar_mass(self, element):

PATH/xraydb/xraydb.py in _elem_data(self, element)
    481         if len(row) > 0:
    482             row = row[0]
--> 483         return ElementData(int(row.atomic_number),
    484                            row.element.title(),
    485                            row.molar_mass, row.density)

AttributeError: 'list' object has no attribute 'atomic_number'
newville commented 7 years ago

@ordirules Well, I guess the question is what should be supported for Atomic Number. A plain Python int makes sense for sure. A ndarray is less obvious, though I can see that the dreaded single-element array might be useful to support, even though these are often a source of pain.

We should probably go full on duck typing with a simple:

element = int(element)

and hope for the best. That would allow '79' and 79.0 (or even 79.1), and np.array([79]) (with any dtype, I believe), but would not allow [79] or (79, ).

I don't have a strong preference....

jrmlhermitte commented 7 years ago

sounds good, except the _elem_data function I think also treats the string case separately (since 'Au' is accepted for example), so there may be some checking if not isinstance(element, str) i guess. But I didn't know int(np.array([79])) works, neat! I think it would help. Would it be possible to add? As of now I basically typecast my numbers before inputting to the routines.

newville commented 7 years ago

@ordirules I would propose the following change:

  1. collect the valid atomic symbols at the end of __init__():
self.atomic_symbols = [e.element for e in self.tables['elements'].select().execute().fetchall()]
  1. Alter _elem_data(self, element) to start as
elem = ElementsTable.element
if element not in self.atomic_symbols:
    element = int(element)
    elem = ElementsTable.atomic_number

That will allow 'Ag', u'Ag' (for Py2), 47, 47.0, '47', np.array([47]) to all stand for 'silver'. Note that 'silver' is not actually supported, as we want to not have to support multiple languages.

jrmlhermitte commented 7 years ago

That sounds good. It will throw a ValueError if element is an alphanumeric string that wasn't found in the table. Maybe also catch it with try as well.

On a side note, about the unicode, it seems that there are two lines that check for strings being unicode (line 74 in xraydb.py for instance). This doesn't work in python 3 anymore. unicode doesn't exist because all str are now unicode. Maybe remove them in the meantime? link

Thanks for doing this. Will you merge immediately or make a PR for review?

newville commented 7 years ago

@ordirules I'll make a PR, probably this evening.

Good point about unicode. I change this to six.string_types for the PR.

jrmlhermitte commented 7 years ago

great :-)

jrmlhermitte commented 6 years ago

closed by #13