Open lcnittl opened 5 years ago
A short update: Some structures are still found without ChemExper - seems to be coincidental that the first 150 were not :) Yet, having also realized, ChemExper is still being consulted (having timeouts).
When reading log files, I often had the impression that ChemExper blocks if repeated access is detected. However, I hoped that the other suppliers are still enough in such case.
MOLfiles can be loaded from Acros, Cactus, chemicalbook, Fluorochem, NIST and Pubchem, so there are multiple sources.
MSDS can be loaded from Acros, Activate, Alfa, Apollo, Biosolve, carbolution, Carl Roth, Cayman, Fisher, Fluorochem, ITW/Applichem, Merck, Oakwood and Strem. The blockings are a bit sad as Acros has many substances, good quality data, MOLfiles and MSDS...
In a different case of IP address blockings, proxy services like http://anonymouse.org/cgi-bin/anon-www_de.cgi/http://sciformation.com may help, but I am not sure if we should get into this.
Thanks for your answer! I got the same impression, the first few request go fine, then blocking starts.
Indeed - they are sufficient for most of the molecules. Yet a downside, however, is the long time it takes when being blocked, as several hundreds (or thousands) of timeouts do sum up after all.
I tried to deactivate Acros in the Internet data retrieval
tab from the global setting
, but seemingly without success. Is there an easy way to deactivate Acros temporarily (removing the php file)?
As a side question: We just see the following suppliers in our global settings
(the files are in place):
Is there a setting we are overseeing to also have the others in the list?
I had the same experience with ChemExper temporarily blocked after several attempts as well. As Felix said, there are also many other sources for structure and SDS.
@lcnittl : the deactivation inside OE in Global settings
only removes it from being accessed during Search Chemical in Supplier mode
. I believe it does not stop OE from accessing those suppliers during import from tab-separated text file, as in your case? For this issue, i think there is 2 things you can do:
import.php
on line ~323 (the line # might not be absolutely correct because there might be modification, I posted the snippet below). This will reduce the amount of wait time for nonresponding suppliers. In my experience, if a supplier works, it would take less than 30s, I used to set this setting to 60
lib_supplier_scraping.php
in function getAddInfo()
, you can do:
a) Change set_time_limit
on line ~168 to shorter, again, not sure if this is redundant.
b) Right before the foreach
statement after the set_time_limit
on line ~168, you can add something like this: // Khoi: removing Sigma and Acros because the scrapping scripts for these 2 site do not work and just take time
unset($addInfo[1]); // removing Acros
// unset($addInfo[4]); // removing Sigma; update 2019-07-26, Sigma search is working on A2hosting server now
// unset($addInfo[6]); // removing chemicalBook
The $addInfo[x]
array index number correlate to the suppliers you want can be found in the same file lib_supplier_scraping.php
on line ~88, index starts from 0.
This has worked for me but @rudolphi can tell you the best way.
@lcnittl : I also wrote a couple python scripts to scrape structures and SDS from the internet and add the info into OE as well. they basically look into your OE database of interest, find the molecule (CAS#) with missing structure or SDS and then proceed to scrape from the internet those info. You would need python on your hosting server and root (on the host server) access. If you are interested, please let me know and I can share those scripts with you.
@khoivan88 Thanks for your input. I think I will indeed go with option 2b.
Concerning the python scripts: If you are willing to give them away I would certainly not say no :)
@lcnittl : Here is the link to my python script to search for missing structure. You can install the required packages in requirements.txt
. You need to change to root user on your server first by running su
in the terminal. You can then use the python file inside the update_sql_mol_v6
by running something like python3 update_sql_mol.py
in the terminal. It will ask you if you are using root user (answer y
) and then proceed to ask you the name of the database you want to affect. You will have to type in the name of the database twice (i designed it that way to make sure that the user is sure of what they want to do). After the program is done, you won't see the structure yet. You will have to log in into OE on the webpage as root user, go to See update note below. Let me know if you have any issue and I can walk you through it more.Settings/Batch Processing
, choose the database that you just run the python script on and check all of the following: "MOLECULE", "EMPIRICAL FORMULA", "MW", "DEG. OF UNSAT." , "STRUCTURE", AND "SMILES" and then let OE run to generate structure image. (I wrote this script a while ago and at that time i don't know how to include the generation of structure image in OE yet, I have an idea now on how to incorporate into the python script but I just do not have time yet to go back and add more to the python script.) Sorry for the inconvenience.
https://github.com/khoivan88/update_sql_mol
I have another script to update SDS but I have not upload to github yet. I will do that and then give you the link later.
Update (2020-01-18): the newest version of this script should work without the extra manual Batch Processing step. I have updated instruction in the repo as well.
@lcnittl : so this is the link to updating missing SDS. It runs very similar to the python script for update mol files. However, you just need to run this script and done, no 2nd step required. As usual, if there is any problem, please let me know. https://github.com/khoivan88/find_missing_sds-public
PS: I forgot to say that both of the python scripts are made for OE hosting on Linux (specifically CentOS 7), if you hosted it on a different system like Mac or Windows, you might want to change the download_path
variable on both files to someplace else in your system!
@khoivan88 Thanks for the scripts - they are very much appreciated. I will have a look at them within the next days.
For the OS - no problem, we are running on Debian (containerized, so I will still have a look) :)
Is it possible that ChemExper blocks an IP that sends too many requests? We were running an inventory
Batch processing
withRead data from suppliers
. The first few entries go fine, then requests sent to ChemExper give timeouts. To probe whether ChemExper was down or not, we cURLed from another host - no problem reaching it. After waiting some hours the blocking seems to be reset.I guess there is no possible workaround for this? And did I deduce correctly, that structures are fetched from ChemExper (at least no structures were generated if we deactivated the use of ChemExper by setting
$GLOBALS["suppliers"]["acros"]["alwaysProcDetail"]
tofalse
.https://github.com/rudolphi/open_enventory/blob/61983563e7c916f00db197ecc06a058d47fa4241/suppliers/Acros.php#L29-L38