yoshida-lab / XenonPy

XenonPy is a Python Software for Materials Informatics
http://xenonpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
133 stars 57 forks source link

Cannot download pre-trained models #129

Closed yutaoki closed 5 years ago

yutaoki commented 5 years ago

I found that I can download the models for "stable inorganic compounds for materials project" but cannot download those for "QM9 Dataset" or "PolymerGenome Dataset" either.

When I tried mdl.pull(urls), it responded as follows:

FileNotFoundError: [Errno 2] No such file or directory: '~\S3\organic.nonpolymer.mu_debye\rcdk.fp.fingerprint\mxnet.nn.neural_network\shotgun_mu_Debye_randFP4975_corr-0.7528_mxnet_294-101-28-1_2018-06-13\model-175724d\shotgun_mu_Debye_randFP4975_corr-0.7528_mxnet_294-101-28-1_2018-06-13-045255-symbol.json'

Does anyone have any idea for this problem?

TsumiNa commented 5 years ago

Could you show us your script for the error reproducing?

The error looks like something get wrong when unzipping the downloaded files.

yutaoki commented 5 years ago

The following is my script:

=================================

from xenonpy.datatools import MDL

mdl = MDL()

summary = mdl(modelset_has="QM9", property_has="", save_to=False)

mdl_mu = summary[summary["property"] == "organic.nonpolymer.mu_debye"].sort_values(by="mae", ascending=True)

urls = mdl_mu["url"].iloc[:5]

results = mdl.pull(urls=urls)


FileNotFoundError Traceback (most recent call last)

in ----> 1 results = mdl.pull(urls=urls) 2 results ~\Anaconda3\lib\site-packages\xenonpy\datatools\mdl.py in pull(cls, urls, save_to) 205 f.write(r.content) 206 path = filename[:-7] --> 207 tarfile.open(filename).extractall(path=path) 208 os.remove(filename) 209 path_list.append(path) ~\Anaconda3\lib\tarfile.py in extractall(self, path, members, numeric_owner) 2000 # Do not set_attrs directories, as we will do that further down 2001 self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(), -> 2002 numeric_owner=numeric_owner) 2003 2004 # Reverse sort directories. ~\Anaconda3\lib\tarfile.py in extract(self, member, path, set_attrs, numeric_owner) 2042 self._extract_member(tarinfo, os.path.join(path, tarinfo.name), 2043 set_attrs=set_attrs, -> 2044 numeric_owner=numeric_owner) 2045 except OSError as e: 2046 if self.errorlevel > 0: ~\Anaconda3\lib\tarfile.py in _extract_member(self, tarinfo, targetpath, set_attrs, numeric_owner) 2112 2113 if tarinfo.isreg(): -> 2114 self.makefile(tarinfo, targetpath) 2115 elif tarinfo.isdir(): 2116 self.makedir(tarinfo, targetpath) ~\Anaconda3\lib\tarfile.py in makefile(self, tarinfo, targetpath) 2153 source.seek(tarinfo.offset_data) 2154 bufsize = self.copybufsize -> 2155 with bltn_open(targetpath, "wb") as target: 2156 if tarinfo.sparse is not None: 2157 for offset, size in tarinfo.sparse: FileNotFoundError: [Errno 2] No such file or directory: \\S3\\organic.nonpolymer.mu_debye\\rcdk.fp.fingerprint\\mxnet.nn.neural_network\\shotgun_mu_Debye_randFP4975_corr-0.7528_mxnet_294-101-28-1_2018-06-13\\model-175724d\\shotgun_mu_Debye_randFP4975_corr-0.7528_mxnet_294-101-28-1_2018-06-13-045255-symbol.json'
yutaoki commented 5 years ago

I have another example for this error:

===================================================

from xenonpy.datatools import MDL

mdl = MDL()

summary = mdl(modelset_has="Polymer", property_has="density", save_to=False) mdl_dens = summary[summary["property"] == "organic.polymer.density"].sort_values(by="mae", ascending=True)

urls = mdl_dens["url"].iloc[:5]

results = mdl.pull(urls = urls)


FileNotFoundError Traceback (most recent call last)

in ----> 1 results = mdl.pull(urls) 2 results ~\Anaconda3\lib\site-packages\xenonpy\datatools\mdl.py in pull(cls, urls, save_to) 205 f.write(r.content) 206 path = filename[:-7] --> 207 tarfile.open(filename).extractall(path=path) 208 os.remove(filename) 209 path_list.append(path) ~\Anaconda3\lib\tarfile.py in extractall(self, path, members, numeric_owner) 2000 # Do not set_attrs directories, as we will do that further down 2001 self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(), -> 2002 numeric_owner=numeric_owner) 2003 2004 # Reverse sort directories. ~\Anaconda3\lib\tarfile.py in extract(self, member, path, set_attrs, numeric_owner) 2042 self._extract_member(tarinfo, os.path.join(path, tarinfo.name), 2043 set_attrs=set_attrs, -> 2044 numeric_owner=numeric_owner) 2045 except OSError as e: 2046 if self.errorlevel > 0: ~\Anaconda3\lib\tarfile.py in _extract_member(self, tarinfo, targetpath, set_attrs, numeric_owner) 2112 2113 if tarinfo.isreg(): -> 2114 self.makefile(tarinfo, targetpath) 2115 elif tarinfo.isdir(): 2116 self.makedir(tarinfo, targetpath) ~\Anaconda3\lib\tarfile.py in makefile(self, tarinfo, targetpath) 2153 source.seek(tarinfo.offset_data) 2154 bufsize = self.copybufsize -> 2155 with bltn_open(targetpath, "wb") as target: 2156 if tarinfo.sparse is not None: 2157 for offset, size in tarinfo.sparse: FileNotFoundError: [Errno 2] No such file or directory: '\\S6\\organic.polymer.density\\rcdk.fp.fingerprint\\ranger.rf.random_forest\\shotgun_Density_randFP330_corr-0.981_RF_num.mtry-947_num.trees-989_2018-06-21\\model-165323v\\shotgun_Density_randFP330_corr-0.981_RF_num.mtry-947_num.trees-989_2018-06-21-122107.Rdat'
TsumiNa commented 5 years ago

Thank you. We will check that.

yutaoki commented 5 years ago

Thank you so much.

I think the models for S1 and S2 dataset are fine. The URL for those models are end by "***.tar.gz", such as: http://xenon.ism.ac.jp/mdl/S1/inorganic.crystal.volume/xenonpy.composition/pytorch.nn.neural_network/04cd-290-281-153-75-21@1.tar.gz

But the URLs for the other models (for example, S3 dataset) do not have .tar.gz, such as: http://xenon.ism.ac.jp/mdl/S3/organic.nonpolymer.cv_calmol-1k-1/rcdk.fp.fingerprint/mxnet.nn.neural_network/shotgun_Cv_calmol-1K-1_randFP1000_corr-0.8172_mxnet_186-102-15-1_2018-04-11-032416

Maybe is this related to the problem?

TsumiNa commented 5 years ago

@yutaoki We have confirmed that all the non-crystal (S3 ~ S6) compounds are stored without their extension name '.tar.gz' and this is unexpected behavior but, we also can not reproduce your error. We use the same querying parameters and the mdl.pull work like a charm.

Anyway, all R models will be retrained in python and MDL API will have a huge refactor to enable us to open the public uploading. #138 #65

We will close this issue. Thanks again.