scikit-hep / scikit-hep-testdata

A common package to provide example files (e.g., ROOT) for testing and developing packages against.
BSD 3-Clause "New" or "Revised" License
13 stars 15 forks source link

Views on adding test files that are not explicitly needed for Scikit-HEP testing #86

Closed matthewfeickert closed 1 year ago

matthewfeickert commented 1 year ago

Given https://github.com/ssl-hep/ServiceX/pull/486#pullrequestreview-1161522843 it would be useful for testing purposes of IRIS-HEP's ServiceX to have https://xrootd-local.unl.edu:1094//store/user/AGC/nanoAOD/nanoaod15.root available in scikit-hep-testdata.

$ curl -sLO https://xrootd-local.unl.edu:1094//store/user/AGC/nanoAOD/nanoaod15.root
$ file nanoaod15.root 
nanoaod15.root: ROOT file Version 61409 (Compression: 209)
$ ls -lhtra nanoaod15.root 
-rw-rw-r-- 1 feickert feickert 2.7M Nov  2 02:17 nanoaod15.root

This would be useful and so I'm inclined to go for it (I have branch feat/add-servicex-data with this in ready to PR) but we should probably discuss if we are okay hosting any test files or if we only want to accept responsibility for hosting Scikit-HEP project test files long term. This is also a > 1MB file, so that's also something to keep in mind for consideration. Though in general I could see benefit to having file formats like nanoAOD and xAOD in scikit-hep-testdata.

Thoughts?

(cc @eduardo-rodrigues @jpivarski @henryiii @alexander-held)

edit: @gordonwatts has pointed out that there's nothing particularly special about that file in particular, so a different file could be used.

alexander-held commented 1 year ago

We could also create a smaller version (with less events) of https://xrootd-local.unl.edu:1094//store/user/AGC/nanoAOD/nanoaod15.root for testing purposes.

eduardo-rodrigues commented 1 year ago

Hey, definitely a very good idea 👍. I would also try and skim the file a bit, as "mechanics" is what matters, not a big number of events processed.

eduardo-rodrigues commented 1 year ago

Hey @alexander-held, any update on this front?

alexander-held commented 1 year ago

I have not gotten to creating another file for this purpose, however @ekauffma is currently already producing more nanoAOD versions of 2015 CMS Open Data. Elliot, could you please produce a very small file with a few hundred events (to have a < 1 MB file) and create a pull request to this repository with it? I would suggest using the Powheg + Pythia 8 ttbar simulation for that.

ekauffma commented 1 year ago

Hi all, I have produced a 369KB nanoAOD file with 200 events. I can submit a pull request. Where should the file be located?

alexander-held commented 1 year ago

I think it should go into src/skhep_testdata/data/ and we can name it something like nanoAOD_2015_CMS_Open_Data_ttbar.root or so perhaps? Most of the other files are following a naming scheme indicating which package uses the file, we could add "ServiceX" but since that's not a Scikit-HEP library I am not sure whether it makes sense (and I think a generic nanoAOD file can be more broadly useful too).

ekauffma commented 1 year ago

Okay, I created a pull request #107 with the file!

eduardo-rodrigues commented 1 year ago

@jpivarski, @matthewfeickert, guess we can now close this task via https://github.com/scikit-hep/scikit-hep-testdata/pull/107?

jpivarski commented 1 year ago

Yes; I actually just forgot to link it to that PR.