Closed angellr closed 11 years ago
UMLS2RDF is meant to run in a server setting. UMLS2RDF trades memory for speed loading each ontologies at once in memory. We run the mysql server and UMLS2RDF in separate boxes both with 32G RAM.
6G RAM for both mysql and umls2rdf seems not sufficient.
Mr. Salvadores:
Is there a work around to the memory issue? As this is a one-time data dump, I would rather not have to improve my hardware to make this work. Without digging through the code, could the ontology be built in stages? I just need RXNorm and NDFRT from the UMLS. Appreciate all your help in this matter. Thank you.
-Bob-
Bob Angell 1238 Fenway Avenue Salt Lake City, Utah 84102-3212 SLC (801) 583-8544 / NYC (347) 788-8544 cell: (801) 706-2520 regular email: bangell_AT_gmail.com
LinkedIn: http://linkd.in/bmS7G8
In an effort to reduce email volume, no need to reply to this message.
LinkedIn: http://linkd.in/bmS7G8
“Those who would trade essential liberty for some temporary security; deserve neither.” AND "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin "Malo Periculosam Libertatem Quam Quietum Servitium" "Non Illegitimus Carborundum Est"; "Had Momma Cass and Karen Carpenter shared that ham sandwich, they might both be with us today!"; "Failure is only an event" - Zig Ziglar
On Oct 5, 2013, at 11:41 AM, Manuel Salvadores notifications@github.com wrote:
UMLS2RDF is meant to run in a server setting. UMLS2RDF trades memory for speed loading each ontologies at once in memory. We run the mysql server and UMLS2RDF in separate boxes both with 32G RAM.
6G RAM for both mysql and umls2rdf seems not sufficient.
— Reply to this email directly or view it on GitHub.
In BioPortal we use UMLS2RDF internally to load ontologies in our backend.
Most ontologies are avalaible for download as well. Some of them aren't due to licensing issues. You can contact the BioPortal support list if you run into issues downloading the ontologies you need.
http://bioportal.bioontology.org/ontologies/ http://bioportal.bioontology.org/ontologies/RXNORM http://bioportal.bioontology.org/ontologies/NDFRT
This is the error one receives when visiting these links ….
Thank you again for all your help.
-Bob-
Bob Angell 1238 Fenway Avenue Salt Lake City, Utah 84102-3212 SLC (801) 583-8544 / NYC (347) 788-8544 cell: (801) 706-2520 regular email: bangell_AT_gmail.com
LinkedIn: http://linkd.in/bmS7G8
In an effort to reduce email volume, no need to reply to this message.
LinkedIn: http://linkd.in/bmS7G8
“Those who would trade essential liberty for some temporary security; deserve neither.” AND "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin "Malo Periculosam Libertatem Quam Quietum Servitium" "Non Illegitimus Carborundum Est"; "Had Momma Cass and Karen Carpenter shared that ham sandwich, they might both be with us today!"; "Failure is only an event" - Zig Ziglar
On Oct 5, 2013, at 12:22 PM, Manuel Salvadores notifications@github.com wrote:
In BioPortal we use UMLS2RDF internally to load ontologies in our backend.
Most ontologies are avalaible for download as well. Some of them aren't due to licensing issues. You can contact the BioPortal support list if you run into issues downloading the ontologies you need.
http://bioportal.bioontology.org/ontologies/ http://bioportal.bioontology.org/ontologies/RXNORM http://bioportal.bioontology.org/ontologies/NDFRT
— Reply to this email directly or view it on GitHub.
I cannot reproduce those errors. Try again please.
Manuel,
The purl finally worked … must have been the server upgrade or something on their end.
-Bob-
Bob Angell 1238 Fenway Avenue Salt Lake City, Utah 84102-3212 SLC (801) 583-8544 / NYC (347) 788-8544 cell: (801) 706-2520 regular email: bangell_AT_gmail.com
LinkedIn: http://linkd.in/bmS7G8
In an effort to reduce email volume, no need to reply to this message.
LinkedIn: http://linkd.in/bmS7G8
“Those who would trade essential liberty for some temporary security; deserve neither.” AND "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well-armed lamb contesting the vote." -- Benjamin Franklin "Malo Periculosam Libertatem Quam Quietum Servitium" "Non Illegitimus Carborundum Est"; "Had Momma Cass and Karen Carpenter shared that ham sandwich, they might both be with us today!"; "Failure is only an event" - Zig Ziglar
On Oct 5, 2013, at 1:55 PM, Manuel Salvadores notifications@github.com wrote:
I cannot reproduce those errors. Try again please.
— Reply to this email directly or view it on GitHub.
System: Ubuntu 12.04/6GB Ram; MySQL 5.5; UMLS 2013AA and have the entire UMLS database loaded. I had gone into the code to see if I could tweak and could not see anything easy to fix (yet). I did change the page size to 250000 to see if this helped with the "out of memory" errors, but they persisted. Not sure if connect -> fetch smaller chunks data -> disconnect would help this. Am at a loss to figure out how to proceed. I have changed as many MySQL parameters as I can think of ... also, I my system resources only exhaust about 4GB of memory when this runs. I am providing the output to the error message when I try the program against RXNORM and my MySQL my.cnf file located in /etc/mysql on the box. Any/all help in this process would be greatly appreciated. Thank you in advance for your help.
-Bob-
Here is output when trying to extract RXNORM ONLY as follows:
angellr@dDenali:~/sandbox/ontology/umls2rdf-master$ ./umls2rdf.py Generating RXNORM (virtual_id: 1423, using 'load_on_codes') [UMLS-Query] SELECT * FROM MRCONSO WHERE SAB = 'RXNORM' AND lat = 'ENG' LIMIT 500000 OFFSET 0 [UMLS-Query] SELECT * FROM MRCONSO WHERE SAB = 'RXNORM' AND lat = 'ENG' LIMIT 500000 OFFSET 500000 [UMLS-Query] SELECT * FROM MRCONSO WHERE SAB = 'RXNORM' AND lat = 'ENG' LIMIT 500000 OFFSET 1000000 [UMLS-Query] SELECT * FROM MRCONSO WHERE SAB = 'SRC' AND CODE = 'V-RXNORM' LIMIT 500000 OFFSET 0 [UMLS-Query] SELECT * FROM MRREL WHERE SAB = 'RXNORM' LIMIT 500000 OFFSET 0 [UMLS-Query] SELECT * FROM MRREL WHERE SAB = 'RXNORM' LIMIT 500000 OFFSET 500000 [UMLS-Query] SELECT * FROM MRREL WHERE SAB = 'RXNORM' LIMIT 500000 OFFSET 1000000 [UMLS-Query] SELECT * FROM MRREL WHERE SAB = 'RXNORM' LIMIT 500000 OFFSET 1500000 [UMLS-Query] SELECT * FROM MRREL WHERE SAB = 'RXNORM' LIMIT 500000 OFFSET 2000000 [UMLS-Query] SELECT * FROM MRREL WHERE SAB = 'RXNORM' LIMIT 500000 OFFSET 2500000 [UMLS-Query] SELECT * FROM MRDEF WHERE SAB = 'RXNORM' LIMIT 500000 OFFSET 0 [UMLS-Query] SELECT * FROM MRSAT WHERE SAB = 'RXNORM' AND CODE IS NOT NULL LIMIT 500000 OFFSET 0 [UMLS-Query] SELECT * FROM MRSAT WHERE SAB = 'RXNORM' AND CODE IS NOT NULL LIMIT 500000 OFFSET 500000 [UMLS-Query] SELECT * FROM MRSAT WHERE SAB = 'RXNORM' AND CODE IS NOT NULL LIMIT 500000 OFFSET 1000000 [UMLS-Query] SELECT * FROM MRSAT WHERE SAB = 'RXNORM' AND CODE IS NOT NULL LIMIT 500000 OFFSET 1500000 [UMLS-Query] SELECT * FROM MRSAT WHERE SAB = 'RXNORM' AND CODE IS NOT NULL LIMIT 500000 OFFSET 2000000 [UMLS-Query] SELECT * FROM MRSAT WHERE SAB = 'RXNORM' AND CODE IS NOT NULL LIMIT 500000 OFFSET 2500000 [UMLS-Query] SELECT * FROM MRSAT WHERE SAB = 'RXNORM' AND CODE IS NOT NULL LIMIT 500000 OFFSET 3000000 [UMLS-Query] SELECT * FROM MRSAT WHERE SAB = 'RXNORM' AND CODE IS NOT NULL LIMIT 500000 OFFSET 3500000 Out of memory (Needed 571480 bytes) [UMLS-Query] SELECT * FROM MRRANK WHERE SAB = 'RXNORM' LIMIT 500000 OFFSET 0 Traceback (most recent call last): File "./umls2rdf.py", line 587, in
ont.load_tables()
File "./umls2rdf.py", line 456, in load_tables
for rank in mrrank.scan(filt=mrrank_filt):
File "./umls2rdf.py", line 187, in scan
cursor.execute(q)
File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')
angellr@dDenali:~/sandbox/ontology/umls2rdf-master$
My MySQL config file (with comments on some of the parameters that I have changed) as follows:
The MySQL database server configuration file.
You can copy this to one of:
- "/etc/mysql/my.cnf" to set global options,
- "~/.my.cnf" to set user-specific options.
One can use all long options that the program supports.
Run program with --help to get a list of available options and with
--print-defaults to see which it would actually understand and use.
For explanations see
http://dev.mysql.com/doc/mysql/en/server-system-variables.html
This will be passed to all mysql clients
It has been reported that passwords should be enclosed with ticks/quotes
escpecially if they contain "#" chars...
Remember to edit /etc/mysql/debian.cnf when changing the socket location.
[client] port = 3306 socket = /var/run/mysqld/mysqld.sock loose-local-infile = 1 user = XXXXXXXX
password = XXXXXXXX
database = umls
Here is entries for some specific programs
The following values assume you have at least 32M ram
This was formally known as [safe_mysqld]. Both versions are currently parsed.
[mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0
[mysqld]
* Basic Settings
user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp lc-messages-dir = /usr/share/mysql skip-external-locking
Instead of skip-networking the default is now to listen only on
localhost which is more compatible and is not less secure.
bind-address = 127.0.0.1
* Fine Tuning
key_buffer = 600M
table_cache = 300
table_cache = 900 (increased this from 600 -> 900)
sort_buffer_cache = 20M
read_buffer_cache = 200M
query_cache_limit = 3M
query_cache_size = 100M
query_cache_limit = 5M (increased from 3 -> 5M) query_cache_size = 300M (increased from 100 -> 300) myisam_sort_buffer_size = 200M bulk_insert_buffer_size = 100M join_buffer_size = 100M
key_buffer = 16M
max_allowed_packet = 64M (was 16M ... increased to 32, then 64) thread_stack = 192K thread_cache_size = 8
This replaces the startup script and checks MyISAM tables if needed
the first time they are touched
myisam-recover = BACKUP
max_connections = 100
table_cache = 64
thread_concurrency = 10
* Query Cache Configuration
query_cache_limit = 1M
query_cache_size = 16M
* Logging and Replication
Both location gets rotated by the cronjob.
Be aware that this log type is a performance killer.
As of 5.1 you can enable the log at runtime!
general_log_file = /var/log/mysql/mysql.log
general_log = 1
Error log - should be very few entries.
log_error = /var/log/mysql/error.log
Here you can see queries with especially long duration
log_slow_queries = /var/log/mysql/mysql-slow.log
long_query_time = 2
log-queries-not-using-indexes
The following can be used as easy to replay backup logs or for replication.
note: if you are setting up a replication slave, see README.Debian about
other settings you may need to change.
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 10 max_binlog_size = 100M
binlog_do_db = include_database_name
binlog_ignore_db = include_database_name
* InnoDB
InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
Read the manual for more InnoDB related options. There are many!
* Security Features
Read the manual, too, if you want chroot!
chroot = /var/lib/mysql/
For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
ssl-ca=/etc/mysql/cacert.pem
ssl-cert=/etc/mysql/server-cert.pem
ssl-key=/etc/mysql/server-key.pem
[mysqldump] quick quote-names max_allowed_packet = 16M
[mysql]
no-auto-rehash # faster start of mysql but no tab completition
[isamchk] key_buffer = 16M
* IMPORTANT: Additional settings that can override those from this file!
The files must end with '.cnf', otherwise they'll be ignored.
!includedir /etc/mysql/conf.d/