zerothi / sisl

Electronic structure Python package for post analysis and large scale tight-binding DFT/NEGF calculations
https://zerothi.github.io/sisl
Mozilla Public License 2.0
181 stars 58 forks source link

Keyword arguments for atomic indices are not consistent #206

Closed zerothi closed 4 years ago

zerothi commented 4 years ago

Describe the issue

We need to create consistency throughout sisl (atom, idx and others are currently used)

We should unify

I dislike atom since it may indicate Atom for users, I dislike idx/index since it does not clarify index.

Possibly candidates:

The same applies to orbitals.

Generally I use ia for single atomic indices (and io for orbitals).

zerothi commented 4 years ago

@tfrederiksen I created an issue :)

tfrederiksen commented 4 years ago

I dislike the atom too for the same reason. idx is better, because I think what it refers to, is quite clear from its context.

In any case, for clarity, I think atom_index, orbital_index would be the best solution. Maybe one could also work internally with ia and io?

pfebrer commented 4 years ago

If this is an open poll, I like iatom :)

By the way, in some cases the atom keyword argument does not refer only to indices right? It can also be a string. Or do you plan to remove that?

zerothi commented 4 years ago

If this is an open poll, I like iatom :)

It is an open discussion ;)

By the way, in some cases the atom keyword argument does not refer only to indices right? It can also be a string. Or do you plan to remove that?

No,I plan to keep this, generally atomic indices should go through _sanitize_atom which converts str (Names), bool (to True indices) and slices to regular index arrays.

pfebrer commented 4 years ago

No,I plan to keep this, generally atomic indices should go through _sanitize_atom which converts str (Names), bool (to True indices) and slices to regular index arrays.

Then I like atom or atoms when the argument is going through _sanitize_atom because the other words make me think that I can only pass indices. _sanitize_atom could even accept Atom objects to get the indices of those atoms that are exactly like the passed Atom.

And in cases where only indices (or maybe boolean arrays too) are accepted then use a word like the ones you were proposing. This would indicate to the user that it is going directly into querying a numpy array or whatever.

Maybe I'm missing something, but I don't see how naming an argument indices when it accepts other things would be less confusing :)

zerothi commented 4 years ago

Very good point @pfebrer96, perhaps atom_id? For atom identification?

pfebrer commented 4 years ago

I think I wouldn't expect that atom_id accepts [1, 2, 3, 5]. Maybe in plural could be more self-explanatory. It's more like a query than an id though isn't it? Although atoms_query is a weird name also and too long probably.

I don't know, I've had to face this problem of accepting different data types several times and I'm never sure what name is more intuitive, most of the times I end up going for the most general word (atoms) but in this case I can see why you don't like it.

zerothi commented 4 years ago

What about atom_sites?

pfebrer commented 4 years ago

Sounds good to me!

zerothi commented 4 years ago

@tfrederiksen (I know you preferred the atom_index, but perhaps this would also be good?) @jonaslb your input would be valuable here, what/which do you think? :) Thanks!

tfrederiksen commented 4 years ago

What about selected_atom or selected_atoms?

zerothi commented 4 years ago

I prefer having atom first, so something like atom_select? For instance with methods having both orbitals and atoms as arguments it would be:

def method(atom_select, orbital_select):
    ...

That is also a good idea? @pfebrer96 @jonaslb ? Possibly adding the plural s, I tend to not use it, but probably the intent is clearer.

pfebrer commented 4 years ago

Yees starting with atom sounds better to me too, and select is a nice word also. It can be applied to anything throughout sisl that accepts multiple datatypes :)

zerothi commented 4 years ago

Yes, I also like this one :) Good suggestion Thomas, what do you say about the small change?

tfrederiksen commented 4 years ago

Yet another possibility: atom_selection. This has atom first and avoids a choice between singular or plural forms.

zerothi commented 4 years ago

Ok, so we are down to these (I think?):

I kind of like the first one better, simply because it is shorter (and shorter code is easier to read in if it still retains clarity). Another thing is that probably atoms_select could be used as atom_select where single atoms are required. But then, if indices are forced, then it should be atom_index? I.e. for Geometry.angle for instance. Hmm. what should we do there?

jonaslb commented 4 years ago

Well I think it is strongly implied that it is a selection, so it's kind of a wasted word to put in a keyword 😅

I would have preferred simply atoms (plural) it is short and descriptive. Regarding the implication that atoms would take Atom objects -- perhaps that could simply be allowed as well?

Then for orbital specification it could be either orbitals or the shortened orbs.

I would also be fine with variations over aidx, iatoms and such. I just don't like the very long ones (multi-word spelled out), especially for a 'key' argument like this. I do think plural should be used when words are spelled out, but it's not critical for me.

zerothi commented 4 years ago

Well I think it is strongly implied that it is a selection, so it's kind of a wasted word to put in a keyword

I believe it is only implied a selection for experienced users. This is to guide users by reading the argument name as an indication of allowed values. Very rarely would I expect users to pass index variables (first arguments) by keywords since they are required. This is different for methods which have several options (tbtrans objects for instance), in such cases it may be useful to retain some consistency for end-users. I don't know what is the best route because I agree that atom_* is also a bit weird and long... :(

tfrederiksen commented 4 years ago

What about atom_subset and orbital_subset? I tend to prefer a more explicit and precise keyword in this case over saving a few characters.

zerothi commented 4 years ago

Good idea @tfrederiksen what about atom_set, I would imagine set does not confuse users about intent (i.e. one would never force users to request all atoms)?

tfrederiksen commented 4 years ago

Yes, atom_set is shorter and also quite precise. I like that too. However, the word "set" just has many more meanings than the word "subset", including "the act or action of setting", so a little less concise. Further, this keyword can not be any arbitrary set, it has to be a subset of the atoms in the geometry. Hence I would argue atom_subset is most correct.

Could one imagine using just a keyword subset, both in context of atoms and orbitals?

zerothi commented 4 years ago

This is another thing, would subset be sufficiently descriptive for the method so that users knows it means atom_subset for Geometry objects? It probably is.

When both atoms and orbitals are necessary in the argument list, then I we do need atom/orbital to distinguish.

jonaslb commented 4 years ago

Already the name atoms implies you can pass a collection of atoms. What does adding _set (or subset) give in terms of information that isn't already present in the name atoms? I would argue it will only be misleading given that Set is a builtin class but you actually can't use it.

pfebrer commented 4 years ago

I agree with Jonas that atoms feels good already as I've commented somewhere in this thread. Just so that he doesn't feel alone hahaha.

tfrederiksen commented 4 years ago

You are right, atoms is also very good. But atom was always confusing to me.

zerothi commented 4 years ago

Ok, so we will go for atoms ? hehehe!

So atoms and orbitals, any objections?

jonaslb commented 4 years ago

Not an objection but maybe something to consider: Will there be a need for differentiating supercell indices from unitcell indices in keyword names?

zerothi commented 4 years ago

You mean stuff that allows geometry.na + 1 or not?

What about atoms_sc where atoms are allowed to be passed in any supercell? On the other hand probably it is implicit by the method and users who know they need supercell indices would know that it is allowed. Hmm.. Good point.

pfebrer commented 4 years ago

Maybe it does not make sense, but this could be a property of the Geometry. And sanitize_atoms would check what the current state is.

Let's say (being super explicit with names):

geom.work_with_unit_cell_inidices_only = True
# Would force _sanitize_atoms to convert all indices to supercell
geom.work_with_unit_cell_inidices_only = False
# Would use the indices as provided

As I was writing it I think it doesn't make sense because you want some methods to have one behavior and some another right?

zerothi commented 4 years ago

Maybe it does not make sense, but this could be a property of the Geometry. And sanitize_atoms would check what the current state is.

Let's say (being super explicit with names):

geom.work_with_unit_cell_inidices_only = True
# Would force _sanitize_atoms to convert all indices to supercell
geom.work_with_unit_cell_inidices_only = False
# Would use the indices as provided

As I was writing it I think it doesn't make sense because you want some methods to have one behavior and some another right?

No, this won't work, if anything it should be an argument for _sanitize_atom(atom, allow_supercell=True|False), because this is a per method option.

pfebrer commented 4 years ago

Yeah, what I thought, thanks.

jonaslb commented 4 years ago

It might be too hypothetical, I don't have a good example off the top of my head anyway where it would be relevant. Probably just atoms is fine and if specification is needed on a per-method basis (if multiple atoms-args must be specified), some appropriate _xxx could be added.