materialsproject / atomate2

atomate2 is a library of computational materials science workflows
https://materialsproject.github.io/atomate2/
Other
169 stars 96 forks source link

Brainstorm: More transparency with regards to default parameters #1038

Open Andrew-S-Rosen opened 3 weeks ago

Andrew-S-Rosen commented 3 weeks ago

I have a big picture question I wanted to kick off here. One of the challenges I have with using atomate2, particularly with new students who are still learning DFT, is that the default parameters for a given job are not particularly clear. The user has to dig through several subclasses to eventually arrive at an answer, which can be daunting. At the same time, hard-coding these into the documentation isn't ideal because then there is likely to be a mismatch if one is changed but not the other.

I find it pretty critical to know, in advance, what parameters are being specified. Does anyone here have suggestions on how to better improve this kind of user experience? Of course, one of the challenges is that some settings may be dynamic, depending on the structure itself. Maybe adding a section to the docs on how to inspect the instantiated class for its parameters is enough?

JaGeo commented 3 weeks ago

Yes, I agree with your statement but I don't have a clear way forward. It also gets very complicated in combination with custodian. We should definitely think about it.

Andrew-S-Rosen commented 3 weeks ago

I think we can exclude Custodian from this part of the discussion because, at least for me, I am mostly interested in knowing how a given job is set up to begin with. For instance, "what is the force tolerance used in this job? Is it too low for me? Too high?" is a question based only on atomate2 logic.

That said, I also don't have a clear way forward... :(

JaGeo commented 3 weeks ago

Mh, not sure about custodian. I am always wondering about consistent calculations (e.g., for phonons, elastic constants). Algo/sigma changes can make this hard. Thus, we should maybe at least mention the logic there as well.

JaGeo commented 3 weeks ago

Could we show for one input set/generator how it works and add documentation on how to get computations compatible with MP Inputs? I feel this would already cover a big part.

Or can we add something to the generators/makers that somehow allows to get a documentation of the input setttings automatically? (E.g., this will be the incar, this will happen to metals specifically etc)

utf commented 3 weeks ago

Agreed this is a problem. I'd hoped this would be made easier by having input sets that are not subclassed but what's happened is we now just have loads of makers that override the input sets there instead.

One technological solution would be to add a custom pipeline into building the docs that takes a job maker, initialises it, generates an input set with a dummy structure, and somehow inserts this into the website documentation. That said, it's not clear to me how much this would help. Personally, I rarely look at the built docs just the source code.

Edit: should have read @JaGeo message properly first - seems like we're on the same page.

JaGeo commented 3 weeks ago

@utf Can we maybe also make it obvious how to get the info from the makers directly?

Andrew-S-Rosen commented 3 weeks ago

One technological solution would be to add a custom pipeline into building the docs that takes a job maker, initialises it, generates an input set with a dummy structure, and somehow inserts this into the website documentation. That said, it's not clear to me how much this would help. Personally, I rarely look at the built docs just the source code.

I think this would help a lot for many users. Sure, I also rarely look at the built docs right now, but that's because it doesn't have this kind of information in it. If it did have this information, I imagine I would rely on it quite a bit. Also, we are all very Python-proficient and have no problem digging through source code, but that's not the case for many. I'm thinking about this from the perspective of users who are not as code-savvy as us three.

It doesn't seem trivial to implement. That said, I think something along these lines is a reasonable idea!

Can we maybe also make it obvious how to get the info from the makers directly?

The exact details would depend on the code (e.g. VASP vs. cp2k) of course, but I think in the short-term this is probably the wisest thing to do. Maybe we can even have a single class method that compiles all the parameters together that way the user doesn't need to fetch different attributes?

QuantumChemist commented 3 weeks ago

I would say that the best solution to providing students the necessary means of understanding what exactly is going on and what impact each parameter has is to improve and add more tutorials. Maybe even some guidance on how to dig through the source code?

JaGeo commented 3 weeks ago

I personally think that it is even a problem if you know how to go through the code as one can easily overlook something. Having this in the documentation would surely help already. I would then also refer people to the documentation if they had questions about it.

JaGeo commented 3 weeks ago

@QuantumChemist of course, in an ideal world everyone should be perfectly trained. I am also sure that everyone involved in this thread is training their students as coworkers as much as they can. However, there will always be a learning curve. And being more clear about these settings will definitely help. Also for the reason mentioned above. With growing pupularity of automation codes, we will also get many less experienced Python users. And we cannot train everyone on IDEs.

hongyi-zhao commented 3 weeks ago

As an example, I've dug into ElasticMaker for several days to understand it deeply. Based on my current understanding, it seems that the code setting snippet given here is invalid:

# only update VASP jobs which have "deformation" in the job name.
new_flow = update_user_incar_settings(
    elastic_flow, {"ENCUT": 200}, name_filter="deformation"
)

In ElasticMaker, the two places for revising the input set are the bulk_relax_maker and elastic_relax_maker, the former including two tight relax jobs, aka, tight relax 1 and tight relax 2, the latter will dynamically create six jobs named from elastic relax 1 to elastic relax 6 during the running process.

To summarize, some jobs in a flow are not real computations and there is no need to revise/update input set settings for them. If I'm wrong, feel free to correct me.

BTW, I go to ElasticMaker from the ElasticMaker listed here. From the results below, there are several ElasticMaker classes defined in atomate2, and in this case, it should be pointed to the first one shown below:

$ ug 'class[ ]+ElasticMaker' -g '*.py'
src/atomate2/vasp/flows/elastic.py:class ElasticMaker(BaseElasticMaker):
src/atomate2/forcefields/flows/elastic.py:class ElasticMaker(BaseElasticMaker):
src/atomate2/aims/flows/elastic.py:class ElasticMaker(BaseElasticMaker):

See https://github.com/Neraaz/HTESP/issues/1#issuecomment-2451363600 for the related discussion.

JaGeo commented 3 weeks ago

@hongyi-zhao thanks for pointing this out.

I would like to add to this thread that we are all doing as much as we can. However, as far as I know, no one of us has explicit funding for atomate2 support in general. Thus, we always have to connect our contributions to a research project. There are many different obligations arising from such research projects and documenting code is only one of them.

Andrew-S-Rosen commented 3 weeks ago

@hongyi-zhao: Yes, this thread is for brainstorming new ideas (without any guarantee of action in the immediate future). Fur bug reports, a new issue should be opened dedicated to that topic.

hongyi-zhao commented 2 weeks ago

I opened a new issue for my above posting.

gpetretto commented 2 weeks ago

I have also got similar questions from atomate's users, regarding how to know beforehand which inputs are going to be used at runtime. To address this I have provided a function that uses the inputset in the Maker to generate the inputs. In this way a user can verify which parameters are effectively going to be used, even those generated dynamically based on the input structure. One option could be to add a similar function or even a method to BaseVaspMaker that takes a Structure as an input and generates the input set. In this way an inexperienced user would have a relatively quick way of generating the actual inputs and inspect them before running the Flow.

Of course this would not be a replacement for a more detailed documentation, as already suggested.

To also add one more point of view, I should say that often final users (e.g. people that are not going to collaborate on the development at any stage and are not necessarily python experts) do not want to be bothered at all with the details of how the whole machinery works. After getting a basic understanding, they would rather just know the inputs that will be used for their calculations. So, while the training of students and collaborators is definitely valuable, there is also a whole class of users that will not consider inspecting the code or referring to a very technical documentation as a viable option.

JaGeo commented 2 weeks ago

@gpetretto this function/method would be extremely valuable for atomate2. I would be very happy if you could provide the function at least within a PR. We might want to include its usage in the documentation and also use this to document the settings for some standard structures.

gpetretto commented 2 weeks ago

To be fair it is nothing fancy. Basically just the initial part of BaseVaspMaker.make(), that returns the input set instead of writing it to file. I will try to contribute this.