Closed 0ctagon closed 1 year ago
Thanks for reporting!
Ah okay, I the problem is that os.path.splitext
expects a string with a dot .
to split on, which is how we originally extracted extensions. The problem is that it only gives the last extension, therefore in the line above we did a normal string split on "."
first
https://github.com/nils-braun/b2luigi/blob/6844963ff7bd23a0e781e87bed559f62c5e862f0/b2luigi/batch/processes/gbasf2.py#L589-L591
This gives us a string with all filename extensions, e.g. I tested the following in IPython:
In [31]: "output.mdst.root".split(".", maxsplit=1)
Out[31]: ['output', 'mdst.root']
And then to check if it's a rootfile I used path.splitext
on the last element. But with output.root
the last element is just root
and path.splitext
will not find an extension in that.
In [32]: "output.root".split(".", maxsplit=1)
Out[32]: ['output', 'root']
In [33]: os.path.splitext("output.root".split(".", maxsplit=1)[-1])
Out[33]: ('root', '')
In [36]: os.path.splitext("output.root".split(".", maxsplit=1)[-1])[-1]
Out[36]: ''
Your solution is find, but it will not guard against filenames like output.groot
for example, where the last extension ends with root
but is not root
. Such files are unlikely to appear in grid jobs, but I would prefer to program defensively, assuming the worst.
A simple solution would be to use another string split, which seems to work for one or multiple dots in the extension:
In [28]: "output.root".split(".", maxsplit=1)[-1].split(".")[-1]
Out[28]: 'root'
In [29]: "output.mdst.root".split(".", maxsplit=1)[-1].split(".")[-1]
Out[29]: 'root'
Actually, when we do a PR, I'd like to outsource this gbasf2-specific splitting+check functionality into its own function and add some unittests for that, just simply check that it does what's expected for output.root
and output.mdst.root
for example.
When downloading
gbasf2
output files, the script doesn't work if the name doesn't contain multiple.
.I propose replacing this line: https://github.com/nils-braun/b2luigi/blob/6844963ff7bd23a0e781e87bed559f62c5e862f0/b2luigi/batch/processes/gbasf2.py#L593
by:
Tested locally with simple output file like
output.root
.