Closed ElkeDeZitter closed 9 months ago
Hi @ElkeDeZitter,
We do have a test for stream file support, but I am afraid it is not very well documented. I haven't really had many users trying it so far.
Internally, careless just uses the reciprocalspaceship
stream file parser. This will provide the following metadata keys:
[ins] In [3]: rs.read_crystfel("crystfel.stream").keys()
Out[3]:
Index(['I', 'SigI', 'BATCH', 's1x', 's1y', 's1z', 'ewald_offset',
'angular_ewald_offset', 'XDET', 'YDET'], dtype='object')
Of these I would recommend using the scattered beam wavevectors, s1x
and s1y
in lieu of XDET
and YDET
which are the coordinates within each detector panel. It should be harmless to supply both, but the s1
vectors are more generally useful. When processing serial, stills, we typically provide both ewald_offset
and angular_ewald_offset
which are the cartesian distance and angular rotation between the predicted spot centroid and the ewald sphere.
Additionally, careless will provide dHKL
(case sensitive), image_id
(should be used in lieu of BATCH
if you have multiple stream
files.
I suggest the following command:
careless mono \
--spacegroups="P 21 21 21" \
--intensity-key="I" \
--uncertainty-key="SigI" \
--image-layers=2 \
"dHKL,s1x,s1y,ewald_offset,angular_ewald_offset" \
my_protein.stream \
careless_merge/my_protein
Not sure if you have stills or rotation images, but if you have stills the ewald offset metadata are really essential for good scaling.
Let me know if this solves your problem. I'll leave this issue open and try to add some more info into the CLI help next week.
Hi @kmdalton ,
Thank you for the response and further explanation which is very helpful (how to get all metadata keys, difference between s1x and s1y vs XDET and YDET). Now careless runs fine with my stream file, which containing still images (thus I provided ewald_offset,angular_ewald_offset).
I haven't tested the case of multiple stream files.
Great! Don't hesitate to reach out for tips. The best/worst thing about careless is that it has a lot of knobs you can tweak. Happy to offer some suggestions. For instance, we have found that using positional encoding on some of the metadata can help with merging serial synchrotron data. You can do this like:
--positional-encoding-keys="s1x,s1y" \
--positional-encoding-frequencies=5
This will add a lot of additional columns to the metadata behind the scenes. Because careless uses the width of the metadata matrix as the default for the width of the neural net layers, this can use a lot of memory if you don't pair it with the --mlp-width
flag to override the default. For instance:
--positional-encoding-keys="s1x,s1y" \
--positional-encoding-frequencies=5 \
--mlp-width=10
10 is a pessimistic value. You can often get away with narrower nets if you're memory limited.
Hi,
I wish to try careless on a stream file from CrystFEL. However, I cannot find the correct way to attribute metadata. Based on the thermolysis_xfel example and the columns for indexed crystals in the stream file I tried the following:
careless mono --spacegroups="P 21 21 21" "dhkl,fs/px,ss/px" my_protein.stream careless_merge/my_protein
with and without addition of--intensity-key="I"
,--uncertainty-key="sigma(I)"
and--image-layers=2
Using those arguments, I get errors like raise KeyError(f"None of [{key}] are in the [{axis_name}]") KeyError: "None of [Index(['dhkl', 'fs/px', 'ss/px'], dtype='object')] are in the [columns]" KeyError: 'sigma(I)'
Using the stream2mtz.py script, I could convert the stream file to mtz, with which careless appears to run fine (metadata: "dHKL,XDET,YDET,ewald_offset,angular_ewald_offset"). However, if I understood correctly, careless could interpret CrystFEL streams directly without the need to convert first.
I am using careless version 0.3.9 installed without gpu-support on a Mac.