xraypy / xraylarch

Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging, and more.
https://xraypy.github.io/xraylarch
Other
127 stars 62 forks source link

Jupyter example how to export groups to an Athena project and a Larix file #496

Open maurov opened 5 months ago

maurov commented 5 months ago

For exchanging data with users still working with Athena/Artemis, it would be nice to show how to export groups to an Athena project file programmatically.

Furthermore, we should also show how to create a Larix session and save data in it without the GUI.

maurov commented 5 months ago

For information, to export groups to Athena from Larix:

image

maurov commented 4 months ago

@newville I realize that I am not able by myself to programmatically (pure Python):

Please, could you give here a minimal working example?

(this may be related to #411)

newville commented 4 months ago

@maurov Yes, I'll include this with #411.

maurov commented 3 months ago

@newville I have seen 430603de3f2a72ba62b6f07690557239bf725cb8, thanks for including such example, it is great, but what I was looking for is an example how to initialize a Larix session programmatically without the GUI, add groups to it and then save to a .larix file. We need this because we would like to convert our raw data from the beamline to a Larix project file (currently we are using Athena project file as exchange format for Larix, but I would prefer using the Larix format).

newville commented 3 months ago

@maurov Yep, I understand, just have not gotten that done yet. But also, if you look at that example and the save_session code at https://github.com/xraypy/xraylarch/blob/master/larch/io/save_restore.py#L69. it is probably not that hard, though maybe we want to break that function apart.

Making a session by hand would "just be" creating a "config" section, a "command history" section (that could be empty) and then a "symbol table" - a Group of datasets and Groups, and with the important _xasgroups group for Larix to map "displayed file name" to "group name". And then using the "encode4js" function as in save_session. Again, we could think about breaking that up so it did not assume a Larch session. For example, currently "Sesssion" is just a namedtuple, but it could be turned into a class with load/save methods.

For Larix to be able to work with a Group, it is probably important to check that it has arrays called "xdat", "ydat", "energy" and "mu". It might also assume some other data that normally would be generated with the "install group" method.....

I may have time to work on this today, but I'm not certain.

maurov commented 3 months ago

@newville thanks for explaining it. This is not urgent, so I propose to postpone this to later release.

maurov commented 2 months ago

@newville I am having hard time to correctly read Athena project files (in Larix or via read_athena) that I have created programmatically with Python. Sometimes (I do not know how to reproduce this!) the names of the groups read from the Athena project file appear as an hash key (=5 random lower-case letters).

Here a minimal/conceptual example to show what I am doing:

#to write
apj = AthenaProject(fname_out)
for something in my_list_of_data:
    g = Group(
      gname = "my_group_name_that_can_be_a_long_string_but_UNIQUE_123"
      id=gname,
      name=gname,
      groupname=gname,
      filename=gname,
      xdat=my_energy_array
      ydat=my_mu_array
      energy=my_energy_array
      mu=my_mu_array
      datatype='xas')
    apj.add_group(g)
apj.save()
#to read back
prj = read_athena(fname_out)
print(prj.groups.keys())
#sometimes like: dict_keys(['fhigt', 'quxkx', 'gamrg', 'rbqec', 'upiyt'])

Do you have a recommendation how to correctly write groups to an Athena project file programmatically in order that they will be correctly read back by Larix, avoiding the hash key names?

I think it would be much easier if a group could have only name (unique identifier, e.g. the hash key) and a label for the human that is shown in the group list.

newville commented 2 months ago

@maurov Hmm, I'm not 100% certain.

A five-letter random string is assigned to each dataset in an Athena Project. That hash key is how Athena keys the data, so that's needed to have Athena reliably read these files.

Athena also keeps a long dictionary of attributes. One of these is called "label".

But here is the basic route:

On saving a Larch group to and Athena file, if the group has a filename attribute, that is used for the Athena attributes 'label'. Whereas the 5-letter key and the "groupname" are required to be valid Python variable name, this filename / label is not. For better or worse, "filename" is used throughout Larix as the label to use for a group.

On reading the Athena project, if there is a label in the attributes, that will be used as the "filename" and group name.

Well maybe "should" if not "will" ;)

The intention is to set group.filename.
If the group does not have a filename (or maybe it is blank), the 5-letter hash will be used...

But something like:

for i, dat in enumerat(data_list):
     g = Group(filename=f'fdataset_{i}', energy=dat.energy, mu=dat.mu, datatype='xas')
    apj.add_group(g)

should work.

maurov commented 2 months ago

@newville thanks for the detailed explanation! Now it is much clear to me. I will use filename only.