icecube / skywriter

Upstream Tools for SkyDriver & the Skymap Scanner
MIT License
0 stars 0 forks source link

Test data for `i3_to_json.py` #8

Closed mlincett closed 11 months ago

mlincett commented 11 months ago

I am thinking of adding test data for the i3_to_json script, however as an example /data/ana/realtime/alert_catalog_v2/input_files/Level2pass2_IC86.2011_data_Run00118435_Subrun00000000_00000144_event58198553.i3.zst is 14M in size.

Should we add this to the repository or can we think of adding test data to the prod-exe fileserver? cc @ric-evans @dsschult @briedel

I would rather rely on an external source since this may be not the only test file we need to run.

ric-evans commented 11 months ago

https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage

looks like you're good. if we have the space, let's use it

ric-evans commented 11 months ago

unless, you plan on adding a lot of files

dsschult commented 11 months ago

At 14M, you could commit it directly to the repo. You start getting a warning at 50M, and github blocks anything larger than 100M.

We definitely could add this to prod-exe too. Any specific path you'd like it to appear as?

mlincett commented 11 months ago

At 14M, you could commit it directly to the repo. You start getting a warning at 50M, and github blocks anything larger than 100M.

We definitely could add this to prod-exe too. Any specific path you'd like it to appear as?

As long as it is a single file. But if then we find the need to have multiple ones, a repo potentially approaching O(100M) in test data is not something I would be very happy with.

What's your take on git lfs, @dsschult ?

dsschult commented 11 months ago

I generally dislike git lfs, unless there's a good reason to use it. I treat it like git submodules, which can have poor interactions.

kjmeagher commented 11 months ago

for simweights i put the test data on our fileserver and used curl to download it https://github.com/icecube/simweights/blob/e892e8c0fc926e2f7b904aeba709a7ceb8dfdf99/.github/workflows/tests.yml#L37

mlincett commented 11 months ago

Since the files for this are already available in /data/ana and are not created ad-hoc, I guess I can use @kjmeagher approach. I guess I just need a repo admin to set secrets.ICECUBE_PASSWORD for me.

kjmeagher commented 11 months ago

That's already an org secret you should be able to use it arlready

ric-evans commented 11 months ago

@mlincett since you'll be pulling from an external source, can you add a step to check the checksum before the actual tests?

mlincett commented 11 months ago

@mlincett since you'll be pulling from an external source, can you add a step to check the checksum before the actual tests?

A working example is now present in #7

Unfortunately, I don't know if there is a way to avoid downloading the same file for each python version.

ric-evans commented 11 months ago

Unfortunately, I don't know if there is a way to avoid downloading the same file for each python version.

GitHub Actions has caching tools that you can use. You'd create a preliminary step to download and put the file in your cache

mlincett commented 11 months ago

It doesn't seem hard. But will require some time to test.

dsschult commented 11 months ago

Just doing the download again might be as fast as the cache, because it's uploading and downloading it internally. We do have plenty of networking for that machine.

mlincett commented 11 months ago

Fixed in #7, no caching for now.