materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.
https://pymatgen.org
Other
1.52k stars 867 forks source link

Problems of new MPRester #2680

Closed Jeff-oakley closed 1 year ago

Jeff-oakley commented 2 years ago

The old MPRester is disabled (I guess because the legacy MP is offline) so I have to use the new MPRester (mp_api.client.MPRester). However, it seems that it is not as convenient as the old one. There are two major issues I have so far:

  1. There is no "inc_structure=True" option for get_entries_in_chemsys. I checked the source code in mp_api. It seems that querying entry with structures are well unstraightforward. Can you let me know if there is any function I can use to do the same thing as get_entries_in_chemsys(['A','B','C'],inc_structure=True)
  2. "Cannot start new thread issue" raises up as it limits the amounts of query. I was querying about 1000 compositional systems, in the 935th query it shows:
stack trace ``` File "/gpfs/research/ouyanggroup/shared_by_group/Bin_test/query_entries_for_echem_win.py", line 31, in mp_entries = mpr.get_entries_in_chemsys(ele_lst) File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/mprester.py", line 833, in get_entries_in_chemsys entries.extend(self.get_entries(all_chemsyses)) File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/mprester.py", line 479, in get_entries for doc in self.thermo.search( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/routes/thermo.py", line 136, in search return super()._search( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/core/client.py", line 920, in _search return self._get_all_documents( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/core/client.py", line 969, in _get_all_documents results = self._query_resource( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/core/client.py", line 294, in _query_resource data = self._submit_requests( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/core/client.py", line 404, in _submit_requests initial_data_tuples = self._multi_thread( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/core/client.py", line 609, in _multi_thread future = executor.submit( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/concurrent/futures/thread.py", line 176, in submit self._adjust_thread_count() File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/concurrent/futures/thread.py", line 199, in _adjust_thread_count t.start() File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/threading.py", line 935, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread ```

Can you give some suggestions on these two issues? Meanwhile, do we lose MP legacy forever or just temporarily?

shyuep commented 2 years ago

@munrojm @mkhorton I will let you answer these. For inc_structure, all new MPRester entries by default includes structure. The only problem is just that the inc_structure kwarg was dropped. The mp-api code just needs to add it back in. As for legacy, it should be up until end of this year at least.

Note that I do have feedback from other users as well that the new MPRester is not as convenient as the old one. I think it is mainly because some of the feature parity is not there yet. Once those are done, I think the users should be ok with it.

munrojm commented 2 years ago

@shyuep @Jeff-oakley The argument is now added back for compatibility, but does not affect the data returned as ComputedStructureEntry objects are always provided by the API.

Could you provide me the query that gave you the threading issue @Jeff-oakley? Also, what sort of machine are you running on?

Jeff-oakley commented 2 years ago

You can reproduce that with the following code:

USER_API_KEY='apikey'
mpr = MPRester(USER_API_KEY)
mp_entries = mpr.get_entries_in_chemsys(['Li','Mn','O'])

Can you also take a look at my second question? "Cannot start new thread issue" raises up as it seems that the MPRester limits the amounts of query. I was querying about 1000 compositional systems, in the 935th query it shows error message in the contents above.

munrojm commented 2 years ago

Yes, I am looking into the threading issue. What machine did you run the query on?

Jeff-oakley commented 2 years ago

I am running this in FSU computing cluster https://rcc.fsu.edu/

munrojm commented 2 years ago

Okay, can you test again with the latest client release and tell me if you are having the same issue @Jeff-oakley?

Also, you might want to edit the comment above to remove your API key.

Jeff-oakley commented 2 years ago

It seems that this is not resolved. I was parsing 1000 compositional spaces up until the 933rd. It gives me the following:

Traceback (most recent call last): File "/gpfs/research/ouyanggroup/shared_by_group/Bin_test/query_entries_for_echem_win.py", line 31, in <module> mp_entries = mpr.get_entries_in_chemsys(ele_lst) File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/mprester.py", line 833, in get_entries_in_chemsys entries.extend(self.get_entries(all_chemsyses)) File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/mprester.py", line 479, in get_entries for doc in self.thermo.search( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/routes/thermo.py", line 136, in search return super()._search( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/core/client.py", line 920, in _search return self._get_all_documents( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/core/client.py", line 969, in _get_all_documents results = self._query_resource( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/core/client.py", line 294, in _query_resource data = self._submit_requests( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/core/client.py", line 404, in _submit_requests initial_data_tuples = self._multi_thread( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/site-packages/mp_api/client/core/client.py", line 609, in _multi_thread future = executor.submit( File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/concurrent/futures/thread.py", line 176, in submit self._adjust_thread_count() File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/concurrent/futures/thread.py", line 199, in _adjust_thread_count t.start() File "/gpfs/home/bo22b/miniconda3/envs/work/lib/python3.10/threading.py", line 935, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread

munrojm commented 2 years ago

So, a single client instance shouldn't be producing a lot of active threads. In fact, I have it wait for threads to finish before starting new ones to avoid issues like this. The max number of threads is set either to the value of multiprocessing.cpu_count(), or 8 if that cannot be determined. Both of these are very modest values.

The one query you provide does not cause any issues for me. In order for me to diagnose this, it may be helpful to have the full script you are using to query the 1000 compositional systems.

Jeff-oakley commented 2 years ago

I guess one easy test is to do the following

USER_API_KEY='[redacted]'

mpr = MPRester(USER_API_KEY)

for i in range(10000):

  mp_entries = mpr.get_entries_in_chemsys(['Li','Mn','O'])
munrojm commented 2 years ago

I am not able to recreate this issue.

shyuep commented 2 years ago

@Jeff-oakley I think the correct way is to do is not to create 10000 get_entries call, but rather just to use MPRester().get_entries(["Li-Fe-O", "Li-O", "Fe-O", ....]). In other words, do a single query.

I am not sure what the threading issue is, but it might be just your OS running out of memory.

Jeff-oakley commented 2 years ago

Thanks for the suggestion. I agree it could be due to memory issues. I will run that in a different cluster with more memories.

Jeff-oakley commented 2 years ago

Just to follow up on this. I realize it is a systematic error message that happens in many supercomputers. It generally works well on a PC with regular university internet. However, whenever I run the following code in super computers, e.g. RCC at FSU and PSC at Pittsburg (Bridge2 allocation from ACCESS). It will consistently give an error message: "mp_api.client.core.client.MPRestError: HTTPSConnectionPool(host='api.materialsproject.org', port=443): Max retries exceeded with url: /thermo/?is_stable=True&_fields=entries&chemsys=Br-Cl-F-I-Mn&_limit=7 (Caused by ResponseError('too many 429 error responses'))"

I suppose such query method is blocked or limited in many supercomputers?

Here is the code I used:

from mp_api.client import MPRester
mpr = MPRester('[redacted]')
ele_lst = ['Li','Mn','O','F','Cl','Br','I']
entries = mpr.get_entries_in_chemsys(ele_lst, additional_criteria={'is_stable': True})
janosh commented 2 years ago

@Jeff-oakley Be aware the snippet you posted includes your API key.

mkhorton commented 2 years ago

I suppose such query method is blocked or limited in many supercomputers?

There is no intentional blocking or limiting of supercomputers. We would definitely like people to use the API, including on supercomputers. It's unclear why you're getting the 429 response without additional context.