ltalirz / atomistic-software

Tracking citations of atomistic simulation engines
https://atomistic.software
GNU Affero General Public License v3.0
19 stars 12 forks source link

Adding new simulation engines #21

Open ltalirz opened 3 years ago

ltalirz commented 3 years ago

This issue tracks information regarding the addition of new simulation engines.

Before suggesting a new engine, please make sure that

  1. it fits the scope of this list
  2. it has had at least one year with 50 citations or more.

Citations are queried on Google Scholar, with typical search terms being the name of a code + the name of a key author (e.g.: VASP Kresse).

There is an actively maintained watchlist of codes that do not yet meet the relevance criterion.

sponce24 commented 3 years ago

Hello,

Very nice Leopold. Maybe EPW could be added (Google scholar 'EPW Giustino' returns 149 citations for 2020).

Thanks, Samuel

ltalirz commented 3 years ago

Thanks, Samuel! As you are one of the code authors, would you mind opening a PR to add EPW to https://github.com/ltalirz/atomistic-software/blob/master/src/data/codes.json? I think it would fit into the S=Spectroscopy category (that, by the way, is the least populated and least well-defined; probably misses out on a couple of other codes). Or let me know which of the "tags" should apply in your view.

No need to add the citation numbers; I'll take care of that.

ltalirz commented 3 years ago

Below a point raised by @jeffhammond via email + a follow-up

Your citation counter uses https://scholar.google.com/scholar?q=NWChem%20Valiev&hl=en&as_sdt=0%2C5&as_ylo=2020&as_yhi=2020, which maps to "NWChem Valiev". Is there any particular reason why you only count NWChem citations associated with Marat Valiev? Are there false positives otherwise?

If you use "NWChem" alone, you see https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&as_ylo=2020&as_yhi=2020&q=NWChem&btnG= has 634 citations instead of 353. This is a nontrivial difference. I would be really surprised if any of these is not associated with the NWChem of interest.

If you want to be explicit about authors, "NWChem Windus OR de-Jong OR Kowalski OR Bylaska OR Apra OR Valiev OR Govind OR Harrison OR Straatsma" will get you 426 citations, with relatively high specificity.

The "code name" + "key author" approach is somewhat of a historical relict from the list created by Luca Ghiringhelli but there also is some thought behind it. While the "code name"-only approach would work for certain engines with highly unique names (NWchem is one of them), it would not work for other codes (think: ORCA, etc.). Including author names in the queries for some codes but not for others would create a relative imbalance in the data set, e.g. in cases where the name of a code is mentioned in the text without a citation / author, or in cases where Google scholar hasn't fully indexed the references.

I think a first improvement over the "code name" + "key author" is indeed to include at least the first authors of recent review articles via OR, which I've now done for nwchem. However, I'm also open to discussing dropping the author from the query altogether, as well as any other suggestions for improving the general query approach.

P.S. One could argue that maintaining a list of review papers per code and simply summing up the corresponding citations would be superior to the current approach (even if then one will be double-counting some citations). The main drawbacks of this approach are that it significantly increases the maintenance burden, and that it would no longer be possible to direct users to the results of the query used to obtain the citation results (since multiple queries would be needed). If Google Scholar’s API ever adds support for combining multiple “cited by” queries into one, I will consider switching.

brucefan1983 commented 2 months ago

Hello Leopold, GPUMD (https://github.com/brucefan1983/GPUMD) has a citation of 66 in 2023 according to Google scholar search for the very unique search term of "GPUMD". GPUMD is a general-purpose molecular dynamics simulation software based on classical potentials (empirical and machine-learned). I wonder if it can be added to the list under type FF. Thanks!

ltalirz commented 2 months ago

Hi @brucefan1983 , thanks for the suggestion! I confirm your observation https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&as_ylo=2023&as_yhi=2023&as_vis=1&q=%22GPUMD%22&btnG=

Would you like to open a pull request? Adding the metadata to the codes.json is enough, I will take care of collecting the citation data

For an example, see https://github.com/ltalirz/atomistic-software/pull/169/files

brucefan1983 commented 2 months ago

Hello Leopold, thank you very much for your quick response and confirmation. I have created a PR to add GPUMD. See #186 I would like to remark that GPUMD was published in 2017 and got 6 citations/applications that year. Any data before 2017 from Google scholar search are irrelevant.