trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.19k stars 565 forks source link

Question: Nightly testing with large or internal Sandia data files #1416

Closed ndellingwood closed 3 years ago

ndellingwood commented 7 years ago

What is the best way (if possible) to set up a nightly test that may use large data files or internal data files that cannot be stored within Trilinos?

In particular for Amesos2, we'd like to set up a nightly test for the following cases:

  1. Using matrix market files provided by a customer that cannot be released in Trilinos for a unit test that would catch issues like those in #1289

  2. Potentially using a larger matrix market file for performance testing (e.g. 60MB+). On this topic, what is the threshold for size of data files allowed to be stored in Trilinos?

@bmpersc @maherou @srajama1 @bartlettroscoe

srajama1 commented 7 years ago

@trilinos/framework

bartlettroscoe commented 7 years ago

Sorry, I just saw this (I did not get an email notification).

You can create a new git repo with just these large files and then you can update Trilinos/cmake/ExtraRepositoriesList.cmake to clone it and put it in the right place. Then you CMakeLists.txt file can check if that is cloned and enable the tests if it is. This is what CASL VERA does. I can show you with an example if you like. Or I can just set up a skeleton with a dummy file and you can fill it in. I would create this extra repo just for Amesos2 testing files.

srajama1 commented 6 years ago

@bartlettroscoe : Thanks for the offer. If you can show us with a dummy file, we will follow that template. I assume we will be able to use this model and report to some existing builds in the dashboard.

ndellingwood commented 6 years ago

Adding @trilinos/amesos2 so setting up the unit test and performance mentioned above does not fall off the radar.

bartlettroscoe commented 6 years ago

@srajama1,

Thanks for the offer. If you can show us with a dummy file, we will follow that template. I assume we will be able to use this model and report to some existing builds in the dashboard.

Just one question. Would these large files be needed for the CI build of Trilinos or could the CI build be run without these large files? It would be better if every Trilinos developer did not have to clone this extra repo in order to run the CI tests. That would impact every Trilinos developer. Otherwise, if these tests just need to be run nightly or for your local development, then you can just clone this extra repo yourself locally and have access to this to update and modify. And in order to push this repo at the same time as you push to Trilinos, you could use the --extra-repos option with the checkin-test-sems.sh script (or just use gitdist push if you don't use the checkin-test-sems.sh scirpt for some reason). I will provide all the info on what your workflow would need to do to be safe and avoid breaking Trilinos. It is really not much extra stuff.

@trilinos/framework team,

What this would mean is that all of the nightly builds that use the TribitsCTestDriverCore.cmake code to implement ctest -S scripts to submit to CDash would automatically clone this extra repo and therefore there is no change that would be needed at all from the infrastructure at all.

But even with that being the case, are you okay with the solution I mention above? Or would you like to suggest another option for @srajama1? Really I think this is a task that the @trilinos/framework team should be supporting (but no-one from the framework team ever responded to the above mention). Still, I don't want to step on your toes just the same.

Implementing what @srajama1 wants is no big deal and is fully supported by TriBITS and is already used for bringing in and testing the extra packages like MOOCHO like shown at:

Let me know.

srajama1 commented 6 years ago

Just the nightly testing is good for now. This doesn't have to be part of the CI testing.

bartlettroscoe commented 6 years ago

@srajama1,

Just the nightly testing is good for now. This doesn't have to be part of the CI testing.

Okay, last two questions:

How about calling the repo 'amesos2_large_test_data' and cloning it under:

   Trilinos/packages/amesos2/test/

?

So the repo on github would be:

and in the directory tree it would be:

  Trilinos/packages/amesos2/test/amesos2_large_test_data

That would automatically be set up by calling:

$ cd Trilinos/
$ ./clone_extra_repos.py --extra-repos=amesos2_large_test_data

Or, you could just clone it manually with:

$ cd Trilinos/
$ cd pacakges/amesos2/test/
$ git clone git@github.com:trilinos/amesos2_large_test_data.git

Let me know if this is okay or if you have some other preference?

bartlettroscoe commented 6 years ago

Note that the approach that I describe above will also work for holding protected files in certain repos. You can just exclude those repos when you don't have access. For example, the CASL VERA project has different access controls on different git repos so the system had to support checkouts and usage of subsets of repos.

@srajama1, let me know if you are okay with the above repo name and cloned location.

srajama1 commented 6 years ago

I think we will use this immediately use it with a protected repo in internal gitlab. Calling it amesos2_large_data_sets is good. We could place it in packages/amesos2/test/.

bartlettroscoe commented 6 years ago

@srajama1,

To manage protected git repos inside of SNL, we have three options: gitlab-ex.sandia.gov (SON Community Edition), gitlab.sandia.gov (SRN Community Edition) and cee-gitlab.sandia.gov (SRN Enterprise edition). The differences are described under the section:

I just created the GitLab groups:

and the GitLab group:

already exists (looks like someone from the @trilinos/framework team created this) but I don't have the ability to create repos in that GitLab group.

The fact that this repo should be protected complicates things a bit. Since it needs to be protected, it will impact people driving builds to submit to CDash from outside of Sandia and may even impact the automated builds that the Trilinos Framework team will be driving. So before we go any further, we need to answer the following questions:

  1. Does this really need to be a protected repo? Could you just sanitize the data and then put it out on GitHub instead or is that not an option? (If it can be made an option repo, everything is much simpler and we can move forward right away.)

  2. Assuming this needs to be protected, does this need to be protected on the SON or the SRN? (If it needs to be on the SRN, then none of the automated Trilinos builds that run on the SON will be able to access this repo.)

  3. Can anyone on the SON/SRN be allowed to view and clone this repo? Or, does an SNL meta-group need to protect it?

If this repo really needs to be a protected git repo, then we have to bring the @trilinos/framework team into this discussion and coordinate with them. Otherwise, if we are not careful, it will either impact the automated builds they are driving (i.e. break them), or your repo will never get cloned in any of their automated builds. (If this was just an open repo with big files, we could just put this repo under https://github.com/trilinos/ and then everyone's automated builds of Trilinos work work just fine.)

srajama1 commented 6 years ago

Answers to your questions

  1. Yes, the matrices are internal only matrices.
  2. SON is ok.
  3. I don't know. Let me check with the apps. If it is not hard a meta-group would be nice.
bartlettroscoe commented 6 years ago

Answers to your questions

  1. Yes, the matrices are internal only matrices.
  2. SON is ok.
  3. I don't know. Let me check with the apps. If it is not hard a meta-group would be nice.

I don't think the GitLab Community Edition used for gitlab-ex.sandia.gov supports SNL meta-groups. So if you want this repo on the SON, you can't use meta-group controls. Given that the @trilinos/framework teams owns the GitLab group https://gitlab-ex.sandia.gov/trilinos/, you will either need to ask them to create an amesos2_large_data_sets git repo or give me the permissions to create it (I can't right now). But given that this repo needs to be protected, this must be coordinated with the @trilinos/framework team.

srajama1 commented 6 years ago

@jwillenbring @bmpersc : Can you create this repository for @bartlettroscoe ?

bartlettroscoe commented 6 years ago

Since this is a core framework activity, we need to leave it to the framework team to address this. Sorry to create traffic on this.

srajama1 commented 6 years ago

Checking to see the status of this issue.

github-actions[bot] commented 3 years ago

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.

github-actions[bot] commented 3 years ago

This issue was closed due to inactivity for 395 days.