Closed pjvandehaar closed 8 years ago
psycopg2
looks like the best Python postgres library. It has parameterized queries with support for hstore, json, dates, etc, but no magic. Docs at http://initd.org/psycopg/docs/usage.html.
It currently loads 18k variants x 1500 phenos in 10 minutes.
For a while, python has one cpu at 100% while postgres holds one at 50%. Then python quits and postgres sits at 100%. Figure out the relative lengths and the reason for this.
At that rate, 30M variants should take 10 days. Can we parallelize it?
Data to import
/net/dumbo/home/larsf/PheWAS/PheWAS_code_translation_v1_2.txt
(15k lines):/net/dumbo/home/larsf/PheWAS/PheWAS_code_v1_2.txt
(1488+1 lines):/net/dumbo/home/larsf/PheWAS/MATCHED/PheWAS_${NR}_MATCHED.epacts
:What shall we do about those
NA
s? Not insert them?Where should we get colors for each
category
? I can't find them online. We either make them up, or pull them out of Vanderbilt's code.