Discussing the scope of the `atomistic.software` list

ltalirz commented 3 years ago

Please find below a conversation with @ceriottm , who kindly agreed (suggested, actually) to share this here as a record of the reasoning behind the current scope of the atomistic.software list and its evolution going forward.

Hello Leopold, I hope this email finds you well. I stumbled upon this website http://atomistic.software that apparently you manage, and I was wondering if you could include also i-PI http://ipi-code.org/ The main publications you can track for it are https://www.sciencedirect.com/science/article/pii/S001046551300372X and https://www.sciencedirect.com/science/article/pii/S0010465518303436 Thanks a lot and all the best Michele

Hi Michele,

thanks for reaching out!

I have already considered adding i-pi (see [1]) and my original impression - not having used the code myself - was that it seemed to be more like a wrapper of simulation engines rather than a simulation engine itself, going by the rough definition I'm currently using [2]

a piece of software that, given two sets of atomic elements and positions, can compute their (relative) internal energies. In most cases, engines will also be able to compute the derivative of the energy with respect to the positions, i.e. the forces on the atoms, and perform tasks like geometry optimizations or molecular dynamics.

The reason I'm making this distinction is that I currently don't have a category for wrapper/orchestration-type software and it is not clear to me how one could limit the scope of such a category in an elegant way. E.g. where should one draw the line in this list: i-pi => deepmd-kit => ASE => AiiDA => fireworks => [insert generic workflow manager here]?

That said,

My understanding of how i-pi works might be wrong, and

I'm very open to improve/modify the definition to include codes like i-pi, if it is possible to do it in a way that does not lead to the list exploding.

Please let me know your thoughts!

Cheers, Leo

[1] https://github.com/ltalirz/atomistic-software/issues/21 [2] https://github.com/ltalirz/atomistic-software#scope

Hello. I asked myself that question, but then I saw you had ASE in it which is as much as a wrapper as it gets. Sure it has a module to compute some simple potentials, but so has i-PI and I would not argue about it being an "engine" based on that - it's not just how it is used by most of the people. Personally I find the current definition of an engine arbitrary and unnecessarily narrow: ou already make an arbitrary exception for "spectroscopy" codes and there's more to life than energy and (perhaps) forces ^_^'

Based on what the domain says, the line seems to be naturally drawn by the focus on "atomistic" simulations: I would not be surprised to see phonopy or AiiDA on that list, I'd be surprised to see say signac or abaqus. I agree there's a risk of the list exploding - from that point of view I think it would make sense to apply a "relevance" threshold, and to apply it retroactively as otherwise you'll get endless complaints.From that PoV I think that i-PI does not (currently) meet the 100 citations criterion, and that seems to me a perfectly good reason to "wait and see": as I mentioned, the only thing that weakens that argument is that half of the codes on that list don't make it. That's also a criterion that is easy to automate BTW so a big plus!

All the best Michele

Thanks for sharing your thoughts, Michele, they are very welcome!

Part of the inconsistencies you mention stem from the fact that the original version of the list by Luca Ghiringelli [1] didn't have a relevance threshold and included codes like yambo and BerkeleyGW but listed them under "WFM" (berkeleygw) and "DFT" (yambo) although you typically can't compute total energies with them. It also included ASE.

Your comments make me think that I will need to remove the historical codes that don't meet the relevance criterion I imposed for new additions. I had documented the inconsistency here [2] but I fear people won't see it and get confused. I guess even if I had documented it more clearly in the "about" http://atomistic.software/#/about that would not solve the problem... As for the threshold itself, the number is up for debate. As one can see in [3] there aren't all that many codes on the 20-100 citations/year watchlist (for the <20 citations, the watchlist is of course very incomplete), so one could imagine lowering it to something like 50, but 100 seemed like a reasonable round number.

As for the scope, I agree with your point about the definition being narrow and I'll think about how best to extend it. I think it's very positive if developers want to see their code on the list, and in the end the purpose of this list is to be a useful resource for practitioners in the field, so in that context having codes like i-pi and ASE on the list certainly makes sense. If you were to pick a name for the category of codes like ASE or i-pi, what would it be?

Cheers, Leo

[1] https://www.nomad-coe.eu/old-pages/externals/codes [2] https://github.com/ltalirz/atomistic-software#adding-a-simulation-engine [3] https://github.com/ltalirz/atomistic-software/issues/21

Hi Leo,

I understand the "historical" side and TBH I think your n.1 goal should be not to get too much harassment for getting involved in the maintenance of this list. To me, it would make sense really to make the process as automated as possible, and to set up things so that developers share as much of the burden as possible. I think 100 cites is indeed a nice round number, and Google Scholar as a source is rounding up so I do think it's fair, and it is a fairly high bar so you can be sure you won't get thousands of entries to worry about.

As for "categories" I think it would make your life much easier (goal n. 2!) to think in terms of "tags" - there I could think of having total energy; functional properties; md and sampling; structure optimization and search; machine learning models; workflows and automation; analysis and visualization; .... - once again, the onus of choosing tags might be on the developers rather than on you.

All the best Michele

Hi Michele,

I understand the "historical" side and TBH I think your n.1 goal should be not to get too much harassment for getting involved in the maintenance of this list. To me, it would make sense really to make the process as automated as possible, and to set up things so that developers share as much of the burden as possible. I think 100 cites is indeed a nice round number, and Google Scholar as a source is rounding up so I do think it's fair, and it is a fairly high bar so you can be sure you won't get thousands of entries to worry about.

Ok!

As for "categories" I think it would make your life much easier (goal n. 2!) to think in terms of "tags" - there I could think of having total energy; functional properties; md and sampling; structure optimization and search; machine learning models; workflows and automation; analysis and visualization; .... - once again, the onus of choosing tags might be on the developers rather than on you.

Thanks for the suggestions! The current "categorization" is already done in terms of tags - currently there is one set of tags for the method (dft/ff/tb/...) and one set of tags for more technical aspects. Lumping all tags together would make life easy here... I wonder whether it still makes sense to let tags have a "type". I'll think about it over the weekend.

Cheers, Leo

ltalirz commented 3 years ago

As a first step, the 100 citations/year cutoff has now been enforced also on historical entries 30a0b6922ae575471672c18bed58d2ff2a2a4dd4

ceriottm commented 3 years ago

Just had a quick browse through the commit diff - seems strange that dftb+ doesn't make the cut - the main publication is from 2007 and is listed on GScholar at 1500 cites and counting

ltalirz commented 3 years ago

Thanks for checking! My gut feeling also was that DFTB+ was relatively widely used but then I didn't know for sure. The citations of the paper in the year 2020 are shown as 187 .

I went through a few of them and they do generally seem to contain the term "DFTB+", which was the only query string used. This almost looks like an indexing issue to me, I'll see whether I can report it to Google Scholar.

In the meanwhile, we can switch to citations of the paper as the source. Fixed in 39eacacd156d32d49c1027a5a6984ec56216305a

P.S. I also just checked that that no other query string of codes in the list currently contains the "+" symbol.

ltalirz / atomistic-software

Discussing the scope of the `atomistic.software` list #48