mala-project / mala

Materials Learning Algorithms. A framework for machine learning materials properties from first-principles data.
https://mala-project.github.io/mala/
BSD 3-Clause "New" or "Revised" License
81 stars 26 forks source link

UPDATE: Extend/Improve CI #13

Open RandomDefaultUser opened 3 years ago

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Dec 17, 2020, 09:44

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 2, 2021, 10:34

FESL now has a CI pipeline for testing. It tests

There is still quite some way to go:

But I think we have now a foundation we can extend at our leisure.

By Fiedler, Lenz (FWU) - 146409 on 2021-02-02T10:34:23 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 2, 2021, 17:41

This is great! I think now is a good time to think about a reduced, self-contained test set. We might take a small DFT-MD snapshot from the new Fe dataset that Mani is generating. This has the advantage that the comparison with the DFT reference data would be more meaningful.

By Cangi, Dr. Attila (FWU) - 139621 on 2021-02-02T17:41:30 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 2, 2021, 18:51

In general it is a bad idea to upload binary files to a repository (such as NumPy binary files for example). If you need to access huge amount of data for testing purposes this data should be placed somewhere else. What comes to my mind is /bigdata on the HPC cluster. However, one would have to figure out how to access it from within the CI (and if it is possible at all).

By Kotik, Daniel (FWU) - 140179 on 2021-02-02T18:51:38 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 2, 2021, 17:49

I agree, I guess the main problem is how to access the data. Even for small examples it would be too much for a repository. I will take a look into that.

By Fiedler, Lenz (FWU) - 146409 on 2021-02-02T18:51:38 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 3, 2021, 09:25

Accessing a remote server from within CI seems to be a solvable problem: https://docs.gitlab.com/ee/ci/ssh_keys/.

By Kotik, Daniel (FWU) - 140179 on 2021-02-03T09:25:35 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 3, 2021, 08:47

We could simply make a folder in the HZDR nextcloud and add some instructions in the install file where to find those. We could then add a script that performs a setup and links the examples to this data folder and then the user could do everything out-of-the-box.

By Fiedler, Lenz (FWU) - 146409 on 2021-02-03T09:35:40 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 3, 2021, 09:35

Is it possible to access Nextcloud content via commandline instructions? I think there is also a limit in size. What binary file size are we talking about anyway?

By Kotik, Daniel (FWU) - 140179 on 2021-02-03T09:39:42 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 3, 2021, 08:36

Yes, I think that is precisely the problem. If we can access hemera folders we can use real data without problems.

By Fiedler, Lenz (FWU) - 146409 on 2021-02-03T08:36:57 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 3, 2021, 08:45

That would be a solution for our own development of the code. But what about external users of the code? It would be nice to provide them with a small test suite like the examples Lenz has set up.

By Cangi, Dr. Attila (FWU) - 139621 on 2021-02-03T08:45:31 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 3, 2021, 08:50

Yes, that's a good option.

By Cangi, Dr. Attila (FWU) - 139621 on 2021-02-03T08:50:04 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 3, 2021, 10:02

@kotik79 I also like the options you provided. Having the option tracking would be nice and, at the same time, keeping the repository at a reasonable size is an important point.

By Cangi, Dr. Attila (FWU) - 139621 on 2021-02-03T10:02:31 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 3, 2021, 10:04

Ah cool, I have never heard of lfs. But that is a good idea that we should look into.

By Fiedler, Lenz (FWU) - 146409 on 2021-02-03T10:04:44 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 3, 2021, 09:33

Another option is to use git large file storage, all major platforms support it (also HZDR's GitLab instance)

This comes in handy if the binary file should be tracked with a VCS and is expected to change over time. In this case the repository does not grow in size rather the single binary blobs reside next to the repo, for details see above references.

By Kotik, Daniel (FWU) - 140179 on 2021-02-03T10:39:19 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 8, 2021, 14:07

Regarding the reduction of CI pipline runtime, two things may come in handy here:

I'll have a look how to do that.

By Kotik, Daniel (FWU) - 140179 on 2021-02-08T14:07:18 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 2, 2021, 10:34

changed title from {-Add-} CI to {+UPDATE: Extend/Improve+} CI

By Fiedler, Lenz (FWU) - 146409 on 2021-02-02T10:34:53 (imported from GitLab)

RandomDefaultUser commented 3 years ago

In GitLab by @RandomDefaultUser on Feb 2, 2021, 10:34

changed the description

By Fiedler, Lenz (FWU) - 146409 on 2021-02-02T10:34:53 (imported from GitLab)