tanghaibao / goatools

Python library to handle Gene Ontology (GO) terms
BSD 2-Clause "Simplified" License
766 stars 211 forks source link

How to get Association file? #144

Closed susanGhaderi closed 4 years ago

susanGhaderi commented 4 years ago

To whom may concern,

I have installed your package, and I test it that works perfectly. But when I am going to use it on my data, by replacing the study file with my sample data and population with my population data, I got the following error: """ go-basic.obo: fmt(1.2) rel(2019-11-02) 47,242 GO Terms HMS:0:00:00.884119 128,975 annotations READ: data/association Traceback (most recent call last): File "scripts/find_enrichment.py", line 44, in main() File "scripts/find_enrichment.py", line 31, in main obj = GoeaCliFnc(GoeaCliArgs().args) File "/anaconda3/lib/python3.6/site-packages/goatools/cli/find_enrichment.py", line 224, in init _study, _pop = self.rd_files(*self.args.filenames[:2]) File "/anaconda3/lib/python3.6/site-packages/goatools/cli/find_enrichment.py", line 405, in rd_files study, pop = self._read_geneset(study_fn, pop_fn) File "/anaconda3/lib/python3.6/site-packages/goatools/cli/find_enrichment.py", line 413, in _read_geneset if next(iter(pop)).isdigit(): StopIteration """ I would be so thankful if you kindly teach me how can I solve this problem? I am completely new in this GO annotation.

Best regards, Susan

dvklopfenstein commented 4 years ago

Susan,

Thank you so much for your interest in GOATOOLS and taking the time to write us.

Can you send the first 20 or so lines of your data/association file?

Regards

susanGhaderi commented 4 years ago

Dear DV, These are my study data: DUSP8 RP11-298I3.4 CNNM3 RP11-118B22.4 RNU6-56P LPIN1 FICD RP11-101P17.11 RBFADN ALKBH3-AS1 CORT MAST4 MARVELD3 CHST12 RP11-192P3.5 PLEKHH2 EBF1 RP11-202D18.2 ACPL2 CCDC173 PRKD3 ALDH1A2 MARK1 CYP2J2 RHOF RNU6-1025P MAPK9 AC011558.5 CEP85L AC090559.1 RP1-117B12.4 RP11-183I6.2 RP11-14N7.2 CYP3A5 ATP6V1H FAAH2 RP11-108K14.8 C4orf6 C20orf166-AS1 ROPN1B PMM2 BROX MAPKAP1 RNU6-198P KPNA5 ASCL3 DNAJB14 EPHA4 LARS2 CTD-2147F2.2 PLAA CMYA5 RN7SL804P CTD-2066L21.3 RMDN2 GPR87 HIST1H2AK NDUFAF5 RN7SL431P LTK RNU6-248P MARVELD1 ENDOG RNA5SP476 RP11-126K1.2 C2orf44 CRYL1 MBIP HES7 RNU6-1306P FANCL C14orf28 RP11-102L12.2 FAM13A C10orf112 AMOTL1 HMGA1 ACN9 BNIP2 LINC01023 HIST1H4K CKAP4 RNU6-1099P RHEBL1 RBM27 ADAM23 LCP1 PIAS2 RP11-314P15.2 LRP1 C1orf106 ARAF ELP5 FOXP1-IT1 RP1-265C24.8 RP11-116N8.1 CTD-2020K17.4 FSCN2 RN7SL587P ATP5S KIAA1468 AL645728.1 ANGPTL5 H1F0 RNU6-419P RP11-1084E5.1 ATP6V1G1 EVL COL1A1 RN7SL34P CLU ACTR3 AMDHD1 CSNK2B-LY6G5B-1181 RP11-297N6.4 LACTB DDX60 RP11-152N13.16 PDZD7 BTBD7 CTD-2541J13.1 CTD-3094K11.1 DDX19B AP005482.1 ACVR2B-AS1 LDHD RP11-30L3.2 RP1-16A9.1 AFF2 JAM3 CTD-2085J24.4 RNU6-883P C3orf58 C10orf67 ENPP4 AC127904.2 RP1-92O14.3 CCDC36 RP1-178F10.1 HOOK1 RAD50 KLHL9 RP11-242D8.1 HOXD3 MIR3155A C15orf62 KLF7-IT1 RNU4-34P RP11-216L13.19 HSF2 But I do not know how to get association file for them? I mean is it a command or they should be somewhere in a biological data set? I do not know how to get them?

Best regards, Susan

susanGhaderi commented 4 years ago

Dear DV,

I have attached both my study case which is 150 and my population which is 2000 genes. Since I am going to use your package for more study cases, I would be so thankful if you kindly let me know how get association file for them and use the package for other cases.

I am looking froward to hearing from you.

Best regards, Susan

On Mon, Dec 2, 2019 at 8:28 PM DV Klopfenstein notifications@github.com wrote:

Susan,

Thank you so much for your interest in GOATOOLS and taking the time to write us.

Can you send the first 20 or so lines of your data/association file?

Regards

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tanghaibao/goatools/issues/144?email_source=notifications&email_token=AGANF62ZMDKGCL7EA5WTQQLQWVOXHA5CNFSM4JTXBQYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFUTXXY#issuecomment-560544735, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGANF64M2S4VQNZRC6FWXETQWVOXHANCNFSM4JTXBQYA .

DUSP8 RP11-298I3.4 CNNM3 RP11-118B22.4 RNU6-56P LPIN1 FICD RP11-101P17.11 RBFADN ALKBH3-AS1 CORT MAST4 MARVELD3 CHST12 RP11-192P3.5 PLEKHH2 EBF1 RP11-202D18.2 ACPL2 CCDC173 PRKD3 ALDH1A2 MARK1 CYP2J2 RHOF RNU6-1025P MAPK9 AC011558.5 CEP85L AC090559.1 RP1-117B12.4 RP11-183I6.2 RP11-14N7.2 CYP3A5 ATP6V1H FAAH2 RP11-108K14.8 C4orf6 C20orf166-AS1 ROPN1B PMM2 BROX MAPKAP1 RNU6-198P KPNA5 ASCL3 DNAJB14 EPHA4 LARS2 CTD-2147F2.2 PLAA CMYA5 RN7SL804P CTD-2066L21.3 RMDN2 GPR87 HIST1H2AK NDUFAF5 RN7SL431P LTK RNU6-248P MARVELD1 ENDOG RNA5SP476 RP11-126K1.2 C2orf44 CRYL1 MBIP HES7 RNU6-1306P FANCL C14orf28 RP11-102L12.2 FAM13A C10orf112 AMOTL1 HMGA1 ACN9 BNIP2 LINC01023 HIST1H4K CKAP4 RNU6-1099P RHEBL1 RBM27 ADAM23 LCP1 PIAS2 RP11-314P15.2 LRP1 C1orf106 ARAF ELP5 FOXP1-IT1 RP1-265C24.8 RP11-116N8.1 CTD-2020K17.4 FSCN2 RN7SL587P ATP5S KIAA1468 AL645728.1 ANGPTL5 H1F0 RNU6-419P RP11-1084E5.1 ATP6V1G1 EVL COL1A1 RN7SL34P CLU ACTR3 AMDHD1 CSNK2B-LY6G5B-1181 RP11-297N6.4 LACTB DDX60 RP11-152N13.16 PDZD7 BTBD7 CTD-2541J13.1 CTD-3094K11.1 DDX19B AP005482.1 ACVR2B-AS1 LDHD RP11-30L3.2 RP1-16A9.1 AFF2 JAM3 CTD-2085J24.4 RNU6-883P C3orf58 C10orf67 ENPP4 AC127904.2 RP1-92O14.3 CCDC36 RP1-178F10.1 HOOK1 RAD50 KLHL9 RP11-242D8.1 HOXD3 MIR3155A C15orf62 KLF7-IT1 RNU4-34P RP11-216L13.19 HSF2

AAR2 ABCC5 ABHD15 ABHD17C AC005863.1 AC009495.3 AC009506.1 AC009994.2 AC010127.3 AC010226.4 AC010729.1 AC010884.1 AC010982.1 AC011196.3 AC011484.1 AC011524.2 AC011526.1 AC011558.5 AC015933.2 AC016735.3 AC016745.1 AC016747.3 AC017076.5 AC018737.1 AC021860.1 AC022007.5 AC023490.1 AC025280.1 AC037445.1 AC055811.1 AC066593.1 AC067945.4 AC067961.1 AC068282.3 AC068489.1 AC068499.10 AC079807.4 AC083843.3 AC090559.1 AC091878.1 AC092755.4 AC092757.1 AC092835.2 AC093157.1 AC096546.1 AC096772.6 AC097103.1 AC097500.2 AC097724.3 AC099850.1 AC105339.2 AC105402.4 AC105760.3 AC107072.2 AC124997.1 AC127904.2 AC130469.2 AC130710.1 AC131097.3 AC131971.1 AC138655.6 AC139100.3 AC142528.1 AC226150.4 ACAA2 ACAP1 ACAP2-IT1 ACAT1 ACBD6 ACKR2 ACN9 ACOT2 ACPL2 ACRV1 ACSL6 ACTR3 ACTR3B ACVR1 ACVR2B ACVR2B-AS1 ADA ADAD1 ADAM12 ADAM15 ADAM19 ADAM20P1 ADAM23 ADAM32 ADAMTSL1 ADAMTSL2 ADAP1 ADAR ADK ADM5 ADRA2A ADRA2C AEBP2 AF001548.5 AF127936.7 AF129408.17 AF131215.3 AF131215.6 AF178030.2 AFF2 AFF2-IT1 AFG3L2 AGAP5 AGBL4 AGBL5-AS1 AGBL5-IT1 AGK AGL AGMAT AGO1 AGO2 AGO4 AGPAT1 AGPAT2 AGPAT3 AGPAT4 AGPAT5 AGPAT6 AGPAT9 AGR2 AGTPBP1 AHCYL1 AHI1 AIFM1 AIM1 AIM1L AJ271736.10 AJUBA AK7 AK9 AKAP10 AKAP3 AKAP9 AKR1E2 AKR7A2 AL132989.1 AL137229.1 AL138963.1 AL139319.1 AL139396.1 AL353147.1 AL353626.2 AL354808.2 AL391803.1 AL590085.1 AL627309.1 AL645728.1 AL662800.1 ALDH1A2 ALDH3B1 ALDH6A1 ALG6 ALKBH3-AS1 ALKBH6 ALKBH7 ALS2CR12 AMD1 AMDHD1 AMDHD2 AMELX AMER1 AMER2 AMH AMMECR1L AMN AMOTL1 AMPH ANGPTL3 ANGPTL4 ANGPTL5 ANK1 ANKHD1-EIF4EBP3 ANKLE1 ANKRD10 ANKRD31 ANKRD36 ANKRD42 ANLN ANXA1 ANXA2 ANXA3 AP000322.53 AP000487.6 AP000560.1 AP000807.2 AP000974.1 AP001412.1 AP001623.1 AP005482.1 AP006216.11 AP006621.6 AP2S1 AP4B1-AS1 APH1B APOOL APP AQP10 AQP5 ARAF ARAP1-AS2 ARFIP2 ARHGAP19-SLIT1 ARHGAP21 ARHGAP25 ARHGAP27 ARHGAP4 ARHGEF26 ARL4D ARL5B ARMC2 ARMC3 ARMC4 ARMC5 ARMC8 ARMC9 ARPC4 ARPC5 ARPP19 ARR3 ARRDC1 ARRDC2 ART4 ARVCF AS3MT ASAH1 ASB2 ASB5 ASCL3 ASTE1 ASTN2 ASXL1 ATAD3C ATF4 ATG16L2 ATL2 ATP13A1 ATP2A1 ATP2A3 ATP5D ATP5E ATP5EP2 ATP5S ATP6V0E2-AS1 ATP6V1D ATP6V1G1 ATP6V1H ATRIP ATXN3 ATXN7L3 ATXN7L3B ATXN8OS AUNIP AVPR1A AWAT1 AXDND1 AXIN1 AXIN2 AZGP1 AZI1 B3GNTL1 B4GALT1 BACH1-IT2 BAG4 BAG6 BAI1 BAI3 BAIAP2 BANK1 BARD1 BBS10 BCCIP BCL2L13 BCL2L14 BEND7 BEX2 BEX4 BHLHA15 BLOC1S1 BMF BMP3 BMP4 BMP8A BMPR1A BNIP2 BPGM BPHL BPIFB4 BPNT1 BPTF BRCA1 BRDT BRINP2 BRIX1 BRMS1 BROX BRPF1 BRPF3 BRSK1 BTBD18 BTBD2 BTBD6 BTBD7 C10orf112 C10orf115 C10orf35 C10orf40 C10orf67 C11orf34 C11orf42 C11orf44 C11orf48 C11orf82 C12orf65 C12orf66 C12orf68 C14orf164 C14orf28 C14orf39 C14orf93 C15orf26 C15orf57 C15orf62 C16orf3 C16orf46 C16orf82 C16orf91 C16orf95 C17orf49 C17orf67 C17orf77 C17orf96 C19orf45 C1orf106 C1orf186 C1orf198 C1orf226 C1orf227 C1orf228 C1orf27 C1orf51 C1orf54 C1orf56 C20orf166-AS1 C20orf202 C21orf119 C21orf49 C21orf58 C21orf59 C21orf62 C21orf67 C21orf90 C21orf91 C21orf91-OT1 C22orf15 C22orf23 C22orf24 C22orf29 C2CD2L C2orf42 C2orf44 C2orf73 C2orf81 C3orf58 C3orf65 C3orf83 C4orf6 C6orf120 C6orf211 C6orf25 C6orf47-AS1 C6orf48 C6orf58 C6orf89 C7orf31 C7orf41 C7orf55 CCDC155 CCDC163P CCDC17 CCDC173 CCDC178 CCDC36 CCDC50 CCDC51 CCDC58 CCDC60 CCDC62 CCDC64 CCDC64B CCDC66 CCDC69 CCDC71 CCDC84 CCDC96 CCNB1IP1 CCPG1 CCT4 CD101 CD164L2 CD247 CD37 CD44 CD47 CDC14A CDC14B CDC27 CDC42-IT1 CDC6 CDC73 CDCA5 CDCA7 CDCA7L CDCP2 CDH10 CDH15 CDH22 CDH24 CDH26 CDH6 CDH7 CDH9 CDIPT CDK17 CDK5RAP3 CDKN2A CEACAM19 CELF1 CELF4 CELSR1 CENPB CENPC CENPE CENPI CENPN CEP192 CEP57 CEP85L CGGBP1 CHD1 CHD1L CHD4 CHDC2 CHERP CHKA CHKB CHMP6 CHMP7 CHN1 CHRM2 CHRNA1 CHRNA7 CHST12 CIDEA CILP2 CIZ1 CKAP2L CKAP4 CKMT1A CLCNKA CLCNKB CLDN10 CLIP3 CLMN CLN5 CLN6 CLN8 CLP1 CLPP CLPSL2 CLPTM1L CLTC CLU CMP21-97G8.2 CMTM3 CMYA5 CNBD1 CNBP CNN1 CNNM3 CNOT1 CNOT10-AS1 CNOT11 CNOT3 COG7 COL10A1 COL13A1 COL15A1 COL16A1 COL17A1 COL18A1 COL18A1-AS1 COL1A1 COL22A1 COL25A1 COL27A1 COL28A1 COL4A1 COL4A3BP COL4A4 COL4A5 COL5A1 COL5A2 COL5A3 COL6A2 COL6A3 COL6A5 COL6A6 COL7A1 COL8A1 COL9A1 COL9A3 COLGALT2 COLQ COMMD1 COMMD2 COMMD6 COQ6 CORT COTL1 COX8C CPD CPNE5 CPSF1 CPSF3 CPXM1 CREBL2 CRELD1 CRK CRLF3 CRY2 CRYBA1 CRYL1 CSF1R CSNK1A1 CSNK1A1L CSNK2A3 CSNK2B CSNK2B-LY6G5B-1181 CTA-204B4.2 CTA-223H9.9 CTA-256D12.11 CTA-481E9.4 CTA-992D9.7 CTAGE1 CTB-113D17.1 CTB-113P19.1 CTB-113P19.3 CTB-118P15.2 CTB-13F3.1 CTB-77H17.1 CTC-1337H24.2 CTC-244M17.1 CTC-273B12.5 CTC-313D10.1 CTC-379B2.4 CTC-429P9.3 CTC-507E2.1 CTC-510F12.4 CTC-512J12.4 CTC-542B22.2 CTC-550B14.6 CTCFL CTD-2006C1.12 CTD-2017C7.2 CTD-2020K17.4 CTD-2026D20.2 CTD-2066L21.3 CTD-2085J24.4 CTD-2126E3.3 CTD-2126E3.4 CTD-2147F2.2 CTD-2154I11.2 CTD-2162K18.4 CTD-2184C24.2 CTD-2199O4.3 CTD-2199O4.6 CTD-2201E18.5 CTD-2207O23.11 CTD-2215L10.1 CTD-2245F17.6 CTD-2267D19.3 CTD-2267D19.6 CTD-2313J17.5 CTD-2319I12.2 CTD-2319I12.4 CTD-2320G14.2 CTD-2323K18.1 CTD-2383M3.1 CTD-2501E16.2 CTD-2525I3.6 CTD-2530N21.4 CTD-2531D15.5 CTD-2537I9.16 CTD-2538C1.2 CTD-2538G9.5 CTD-2541J13.1 CTD-2541M15.4 CTD-2553L13.4 CTD-2554C21.3 CTD-2555K7.2 CTD-2555K7.4 CTD-2561B21.11 CTD-2568P8.1 CTD-2571L23.6 CTD-2587H19.3 CTD-2619J13.17 CTD-2626G11.2 CTD-2659N19.10 CTD-3018O17.3 CTD-3020H12.4 CTD-3032H12.2 CTD-3060P21.1 CTD-3064C13.1 CTD-3074O7.12 CTD-3094K11.1 CTD-3148I10.1 CTDSP1 CTDSP2 CTH CTNNBIP1 CTXN2 CUL4B CUL5 CWC22 CXCL12 CXCL16 CXCR4 CXorf64 CYB561 CYB5A CYB5D2 CYB5R2 CYB5R3 CYHR1 CYP1A2 CYP1B1 CYP24A1 CYP2J2 CYP2U1 CYP3A5 CYP4F2 CYP4F3 CYP4F35P CYP7B1 CYR61 CYSTM1 D2HGDH DACH1 DACT2 DACT3 DAPK3 DCTN3 DCTN6 DDAH2 DDX19B DDX39B-AS1 DDX60 DECR2 DEDD DENND2C DENND4C DENND5A DENND6B DENR DEPTOR DERA DESI1 DESI2 DFNA5 DGCR10 DGKD DHDDS DHFRL1 DHODH DHRS1 DHRS12 DHRS9 DHX36 DIAPH3-AS1 DIRC2 DISP1 DIXDC1 DKFZP434E1119 DKFZP779L1853 DLEU1 DLEU2L DLGAP2 DNAH10OS DNAH17 DNAH2 DNAH6 DNAJA2 DNAJB14 DNAJC18 DNAL4 DNALI1 DNM1 DNM1P35 DNM3OS DNMBP DNMBP-AS1 DNMT1 DNMT3A DOC2B DOCK3 DOCK4 DOCK6 DOCK7 DOCK8 DOCK9 DOK1 DOK3 DOK4 DOK5 DOK6 DOK7 DOLK DOPEY1 DOPEY2 DOT1L DPCD DPH6 DPP8 DRG2 DSCC1 DSCR9 DTD2 DTL DTNA DTX2P1-UPK3BP1-PMS2P11 DTX3L DTX4 DTYMK DUS2 DUS3L DUSP1 DUSP11 DUSP22 DUSP8 DXO DYNC1LI2 DYNLL1-AS1 DYRK1A EBF1 ECHDC1 ECI1 ECM1 EDARADD EDEM1 EEFSEC EFNB2 EFNB3 EGFL6 EGLN3 EHD2 EIF3G EIF3L EIF3M EIF4A2 ELOVL1 ELP2 ELP5 EME2 EML4 ENC1 ENDOG ENG ENO4 ENPP4 ENPP5 EP400 EPG5 EPHA4 EPHB3 ETFA ETHE1 ETV3 ETV6 EVL EVPL EXOC3L1 EXOC7 EXOSC4 EXOSC9 EZH2 EZR F2RL2 F2RL3 FAAH2 FAHD2B FAM105B FAM120AOS FAM127C FAM129A FAM131B FAM135B FAM13A FAM13A-AS1 FAM150A FAM150B FAM160B1 FAM184B FAM196B FAM19A3 FAM19A5 FAM200B FAM208B FAM209B FAM20A FAM210A FAM211A FAM215B FAM217A FAM220A FAM58A FAM63B FAM64A FAM66E FAM69C FAM71F2 FAM72A FAM81B FAM84B FAM87A FANCD2OS FANCL FCER1G FGFR1 FGFR1OP FGFR1OP2 FICD FN3K FOXP1-IT1 FOXP2 FRG1B FRMD5 FRMPD1 FRMPD2 FRMPD3 FRMPD4 FRRS1 FSCN2 FSD1 FSD1L FUK FUT8 FYN G3BP2 G6PC3 GAB1 GADD45A GALK2 GALNT13 GALNT15 GALNT2 GALNT6 GAR1 GARNL3 GAS2 GAS2L1 GAS5-AS1 GATC GATSL1 GBF1 GCA GCLM GCM2 GCNT2 GCNT3 GCOM1 GCSH GDPD2 GEMIN8 GGA3 GHRH GJE1 GK-AS1 GK-IT1 GKAP1 GLA GLB1L2 GLCE GLT8D1 GMPR2 GNAI1 GNG13 GNG5 GNL2 GNPDA1 GOLT1B GOSR2 GOT1 GP1BA GPATCH1 GPC1 GPD1L GPR37L1 GPR61 GPR64 GPR87 GPS2 GREB1 GRIK5 GRIN1 GRM2 GRM4 GRM5 GRM5-AS1 GS1-166A23.2 GS1-259H13.2 GS1-279B7.1 GS1-39E22.1 GS1-600G8.5 GSDMB GSN GSPT1 GSR GSTM3 GTDC1 GTF2F1 GTF2H2C GTF2H3 GTF3C3 GTF3C5 GTF3C6 GTPBP10 GUCA1B GUCY1A2 GUCY1A3 GUCY2F GUF1 GYG2 GYLTL1B GZF1 H1F0 H1FX H2AFV HADH HADHA HADHB HAGH HAGHL HAL HAMP HAND2-AS1 HAO2-IT1 HAP1 HAPLN3 HAR1A HARBI1 HARS HARS2 HAVCR2 HCG18 HCG20 HCG27 HDAC1 HDGFRP2 HECTD2 HES7 HIATL2 HIF1A-AS1 HIF1A-AS2 HIF1AN HINFP HINT1 HINT3 HIP1 HIST1H2AK HIST1H2BB HIST1H2BL HIST1H3D HIST1H3E HIST1H3I HIST1H4D HIST1H4K HIVEP3 HLA-B HLX HM13-IT1 HMGA1 HMGB3 HMGCLL1 HN1L HNF1A HNRNPDL HNRNPUL1 HOOK1 HOTAIRM1 HOXA3 HOXB-AS3 HOXD3 HP HPS6 HPSE HPX HRAS HRASLS HRASLS2 HRC HRH1 HRH3 HRK HS3ST1 HSBP1 HSBP1L1 HSCB HSD11B1L HSD17B1 HSD17B10 HSD17B11 HSD17B12 HSD17B13 HSD17B14 HSD17B2 HSD17B3 HSD17B4 HSD17B6 HSD17B7 HSD17B8 HSD3B7 HSDL1 HSF1 HSF2 HSF4 HSPA4 HTR3A HTR3B HTRA1 HTRA4 HYDIN IARS IDI2-AS1 IER5L IFFO1 IFI44L IFITM10 IFITM2 IFT52 IGF1 IGF2BP2 IGIP IKBKAP IKBKB IKBKG IL10RB IL10RB-AS1 IL13 IL13RA1 IL2RG IMMP2L INPP5D INPP5J INPPL1 INTS1 IP6K2 IP6K3 IPPK IRF2BPL IRF9 IRGM IRGQ ISLR2 ISM1-AS1 IST1 ITCH ITCH-AS1 ITIH3 ITPRIPL1 ITPRIPL2 IZUMO4 JAK1 JAK3 JAM3 JHDM1D-AS1 JKAMP JPH1 KANSL1L KARS KATNB1 KATNBL1 KB-1254G8.1 KB-1410C5.5 KCMF1 KCNA3 KCNA5 KCNAB1 KCND2 KCND3 KCNG2 KCNG3 KCNH1 KCNH7 KCNJ10 KCNJ9 KCNMA1 KCNN2 KCNV1 KCTD10 KCTD17 KDM2B KDM5A KHDC1 KHDRBS2 KIAA1324L KIAA1328 KIAA1430 KIAA1468 KIAA1715 KIAA1875 KIAA1919 KIAA2018 KIAA2026 KIF4A KIF5A KIF6 KIF7 KIF9-AS1 KLC3 KLC4 KLF10 KLF7-IT1 KLHDC1 KLHDC4 KLHL6 KLHL7 KLHL7-AS1 KLHL8 KLHL9 KLLN KLRB1 KNG1 KNTC1 KPNA5 KRAS KRBA1 KREMEN2 KRI1 KRT79 KRTCAP2 KRTCAP3 KTN1 KXD1 L1TD1 LA16c-316G12.2 LA16c-325D7.2 LACTB LARP1 LARS2 LARS2-AS1 LAYN LCP1 LDHD LECT1 LEFTY1 LHX6 LIFR-AS1 LINC00116 LINC00314 LINC00323 LINC00337 LINC00403 LINC00444 LINC00533 LINC00598 LINC00607 LINC00612 LINC00613 LINC00616 LINC00617 LINC00622 LINC00623 LINC00663 LINC00669 LINC00852 LINC00853 LINC00867 LINC00919 LINC00920 LINC00937 LINC00938 LINC00939 LINC00941 LINC00957 LINC00964 LINC00969 LINC01004 LINC01023 LINC01043 LINC01053 LINC01059 LL0XNC01-7P3.1 LL22NC03-2H8.4 LL22NC03-32F9.1 LL22NC03-N27C7.1 LLPH LMAN2L LMBR1L LMF1 LMLN-AS1 LMOD1 LOXL3 LPAR4 LPAR6 LPCAT3 LPCAT4 LPHN1 LPIN1 LRFN3 LRP1 LRP4 LRP5 LRP5L LRRC1 LRRC37A LRRC37A2 LRRC39 LRRC3DN LRRC4 LRRC71 LRRC73 LRSAM1 LSM11 LSM14A LSM14B LSM2 LSM3 LSM4 LSM5 LSM7 LSMD1 LSMEM1 LSMEM2 LSR LSS LTB4R LTBP2 LTBP3 LTBP4 LTK LTN1 LTV1 LUC7L LUC7L2 LUC7L3 LUM LURAP1 LUZP2 LY6E LY6G5B LY6G6C LY6H LYNX1 LYRM5 LZTR1 MAB21L2 MACROD1 MACROD2 MAD2L1 MAD2L2 MADCAM1 MAF1 MAFG-AS1 MAFK MAGEA6 MAGEH1 MAMDC2 MAMSTR MAN1C1 MAP1LC3A MAP2K1 MAP3K6 MAP4K2 MAP9 MAPK1 MAPK6 MAPK8IP1 MAPK9 MAPKAP1 MAPRE2 MARCO MARK1 MARVELD1 MARVELD3 MAST4 MAST4-IT1 MATK MATN2 MBD1 MBD3 MBIP MBLAC2 MBNL1-AS1 MBNL2 MBNL3 MCF2L MCF2L2 MCL1 MCM4 MCMBP MCMDC2 MCOLN3 MCUR1 MECOM MECP2 MED4 MED7 MED9 MEDAG MEGF10 MEIS1-AS2 MELK MEMO1 MEOX1 MEP1B MERTK MESDC1 MESDC2 MESP1 MET METTL13 METTL14 METTL16 METTL21A METTL25 METTL3 MFAP3L MFAP4 MFN1 MFSD11 MGP MIA3 MIAT MICAL1 MIR148B MIR155HG MIR17HG MIR23B MIR29B1 MIR3155A MIR3192 MIR548AT MIR550A3 MIR567 MIR590 MIR609 MIR620 MIR98 MIRLET7I MITD1 MOGS MOK MON2 NDUFAF5 NDUFS1 NUTM2A PDXDC1 PDXK PDXP PDZD3 PDZD7 PDZK1 PDZRN3 PDZRN4 PEA15 PEAK1 PEBP1 PEBP4 PECR PEG3 PELI2 PELO PEMT PENK PEPD PER1 PER2 PER3 PERP PES1 PET100 PET112 PET117 PEX1 PEX10 PEX11B PEX11G PEX12 PEX14 PEX16 PEX19 PEX2 PEX26 PEX5L PEX6 PEX7 PFAS PFDN2 PFDN4 PFDN5 PFDN6 PFKFB1 PFKFB2 PFKFB3 PFKL PGBD2 PGBD4 PGBD5 PGD PGF PGGT1B PGK1 PGK2 PGLS PGM1 PGM2 PGM2L1 PGM3 PGM5 PGM5-AS1 PGPEP1 PGR PGRMC1 PGRMC2 PGS1 PHACTR1 PHACTR2 PHACTR3 PHACTR4 PHAX PHF13 PHF14 PHF15 PHF16 PHYH PHYHIPL PHYKPL PIAS2 PIAS3 PIAS4 PIDD PIEZO1 PIGB PIGC PIGM PIGN PIGO PIGW PIGY PINK1 PINLYP PINX1 PITX1 PIWIL2 PKD1 PKD2L2 PKLR PKNOX1 PLA2G15 PLA2G4C PLAA PLAC9 PLCD4 PLCE1-AS1 PLEKHA3 PLEKHA8 PLEKHB1 PLEKHF2 PLEKHH2 PLEKHJ1 PLK1S1 PLRG1 PLS1 PLS1-AS1 PM20D2 PMEPA1 PMF1 PMFBP1 PML PMM2 PMS2 PNRC1 POC5 PODXL POLA1 POLB POLD1 POLDIP2 POLE2 POLE4 POLG2 POLR1B POLR3E POLR3H POM121 POMZP3 POP4 POPDC2 POPDC3 POR PORCN POTEF POU3F1 POU4F1 POU5F2 PPIB PPID PPL PPP1R12C PPP3CB PPP4R4 PPP5C PPP6R2 PRKCG PRKCSH PRKD3 PRKRIR PRLR PRPF4B PRRG3 PRRX2 PRSS27 PRSS33 PSMA6 PTBP1 PTCHD3P1 PTEN PTER PTGFR PTK2 PTK6 PTMA PTMS PTOV1 PTP4A1 PTP4A2 PTP4A3 PTPLAD2 PTPLB PTPN1 PTPN12 PTPN14 PTPN18 PTPN3 PTPN4 PTPN5 PTPRG-AS1 PTPRT PTTG2 PUM1 PXMP4 PYCR2 PYCRL RAB21 RAB23 RAB27B RAB2A RAB2B RAB30 RAB30-AS1 RAB31 RAB32 RAB33A RAB33B RAB34 RAB35 RAB36 RAB3C RAB3D RAB43 RAB6C RAB8B RAB9B RABGGTA RABL2B RABL3 RAC1 RAD50 RAD51C RAD54B RADIL RALGAPA1 RALGAPB RAMP1 RAMP2-AS1 RAN RANBP10 RAPGEF5 RAPGEF6 RAPGEFL1 RARRES1 RASA2 RASA3 RASAL2-AS1 RASGEF1A RASGRP4 RBFADN RBFOX1 RBM12B-AS1 RBM26-AS1 RBM27 RBM38 RBM39 RBM41 RBM44 RBM46 RBM48 RBM5 RBMS2 RBMS3 RBMXL3 RBP2 RBPJ RBPMS RBPMS2 RBX1 RC3H1-IT1 RCAN3 RCBTB1 RCOR2 RCOR3 RCSD1 RCVRN RD3 RDH13 RDH5 REST RFC3 RFK RFWD3 RFX1 RFXAP RGS14 RHBDD2 RHBDD3 RHBDL2 RHEB RHEBL1 RHOF RIC8B RIIAD1 RILPL1 RIMKLA RIMKLB RINL RINT1 RIPPLY2 RIPPLY3 RLBP1 RLF RMDN2 RMND5B RMRPP1 RN7SKP110 RN7SKP16 RN7SKP243 RN7SKP266 RN7SKP80 RN7SL100P RN7SL177P RN7SL181P RN7SL19P RN7SL21P RN7SL226P RN7SL242P RN7SL268P RN7SL285P RN7SL293P RN7SL329P RN7SL34P RN7SL398P RN7SL39P RN7SL413P RN7SL431P RN7SL452P RN7SL473P RN7SL47P RN7SL485P RN7SL487P RN7SL48P RN7SL491P RN7SL494P RN7SL500P RN7SL503P RN7SL525P RN7SL587P RN7SL653P RN7SL65P RN7SL721P RN7SL754P RN7SL804P RN7SL805P RN7SL94P RNA28S5 RNA5SP107 RNA5SP129 RNA5SP141 RNA5SP146 RNA5SP151 RNA5SP229 RNA5SP265 RNA5SP288 RNA5SP294 RNA5SP340 RNA5SP35 RNA5SP366 RNA5SP401 RNA5SP425 RNA5SP476 RNA5SP477 RNA5SP493 RNA5SP500 RNA5SP82 RNASEH1 RNASEH2A RNASEH2B-AS1 RNASEL RNF168 RNF170 RNF175 RNF213 RNF214 RNU1-8P RNU2-37P RNU2-59P RNU2-5P RNU4-34P RNU4-9P RNU5F-1 RNU6-1025P RNU6-1035P RNU6-1037P RNU6-1046P RNU6-1058P RNU6-1099P RNU6-1170P RNU6-1176P RNU6-118P RNU6-1201P RNU6-1203P RNU6-125P RNU6-1279P RNU6-1285P RNU6-1292P RNU6-1302P RNU6-1306P RNU6-1318P RNU6-1333P RNU6-137P RNU6-13P RNU6-190P RNU6-195P RNU6-198P RNU6-19P RNU6-204P RNU6-213P RNU6-234P RNU6-248P RNU6-249P RNU6-321P RNU6-341P RNU6-3P RNU6-419P RNU6-431P RNU6-437P RNU6-44P RNU6-45P RNU6-476P RNU6-48P RNU6-56P RNU6-574P RNU6-59P RNU6-606P RNU6-652P RNU6-705P RNU6-733P RNU6-74P RNU6-795P RNU6-82P RNU6-831P RNU6-877P RNU6-883P RNU6-885P RNU6-888P RNU6-908P RNU6-936P RNU7-108P RNU7-115P RNU7-117P RNU7-128P RNU7-135P RNU7-140P RNU7-179P RNU7-194P RNU7-19P RNU7-47P RNU7-90P RNVU1-13 RNVU1-14 RNVU1-20 RNY1P4 RNY3P2 RNY3P6 RNY3P8 RNY4P10 RNY5P8 ROBO3 ROPN1B ROPN1L ROR1 ROR2 RORA RP1-111C20.4 RP1-117B12.4 RP1-118J21.5 RP1-122K4.2 RP1-122O8.7 RP1-122P22.2 RP1-130H16.16 RP1-134E15.3 RP1-137K2.2 RP1-140K8.5 RP1-149C7.1 RP1-154K9.2 RP1-155D22.2 RP1-161P9.5 RP1-166H1.2 RP1-168L15.5 RP1-16A9.1 RP1-178F10.1 RP1-178F15.4 RP1-184J9.2 RP1-193H18.2 RP1-20B11.2 RP1-212G6.7 RP1-224A6.3 RP1-239B22.5 RP1-249H1.4 RP1-257A7.5 RP1-257I20.14 RP1-265C24.8 RP1-266L20.4 RP1-267L14.3 RP1-283E3.8 RP1-286D6.5 RP1-309F20.4 RP1-30G7.2 RP1-30M3.6 RP1-310O13.7 RP1-34H18.1 RP1-35C21.1 RP1-86D1.4 RP1-8B1.4 RP1-90J20.12 RP1-90J20.8 RP1-90L6.2 RP1-91J24.3 RP1-92O14.3 RP1-93H18.1 RP1-93H18.7 RP1-93I3.1 RP11-1002K11.1 RP11-1007O24.2 RP11-1018N14.5 RP11-101E13.5 RP11-101E7.2 RP11-101O21.1 RP11-101P17.11 RP11-1026M7.2 RP11-102L12.2 RP11-102M11.2 RP11-1038A11.3 RP11-1057B6.1 RP11-105C19.1 RP11-1072C15.2 RP11-1072C15.4 RP11-1081M5.1 RP11-1081M5.2 RP11-1084E5.1 RP11-1085N6.3 RP11-108K14.8 RP11-109N23.4 RP11-109P14.10 RP11-10N23.4 RP11-10O17.3 RP11-1101H11.1 RP11-1105G2.4 RP11-1105O14.1 RP11-1109F11.3 RP11-110I1.11 RP11-110I1.12 RP11-111A22.1 RP11-111J6.2 RP11-112H10.4 RP11-1143G9.4 RP11-114F10.3 RP11-114N19.3 RP11-115C21.2 RP11-115D19.1 RP11-116N8.1 RP11-116O11.1 RP11-118B22.4 RP11-120A1.1 RP11-120K24.3 RP11-122G18.5 RP11-122K13.12 RP11-124N14.3 RP11-124N2.1 RP11-126K1.2 RP11-126K1.6 RP11-126O1.4 RP11-132A1.3 RP11-1336O20.2 RP11-133K1.7 RP11-134N1.2 RP11-137L10.6 RP11-138E16.1 RP11-13K12.1 RP11-13N13.2 RP11-141M1.3 RP11-145M9.4 RP11-14N7.2 RP11-152N13.16 RP11-152N13.5 RP11-153M7.5 RP11-157F20.3 RP11-157I4.4 RP11-158D2.2 RP11-15B17.1 RP11-15H20.7 RP11-160H22.5 RP11-162A12.2 RP11-162D9.3 RP11-163N6.2 RP11-164C12.2 RP11-165P7.1 RP11-166B2.1 RP11-166B2.5 RP11-166N17.1 RP11-166N17.3 RP11-166O4.5 RP11-166O4.6 RP11-166P13.3 RP11-166P13.4 RP11-167H9.4 RP11-169K17.4 RP11-176N18.2 RP11-177G23.2 RP11-177J6.1 RP11-178C3.2 RP11-178L8.5 RP11-17E13.2 RP11-17L5.4 RP11-181E10.3 RP11-183I6.2 RP11-18F14.1 RP11-192H23.4 RP11-192H23.6 RP11-192H23.7 RP11-192P3.5 RP11-197M22.2 RP11-199F11.2 RP11-1C8.5 RP11-202D18.2 RP11-203M5.8 RP11-206L10.3 RP11-20B24.6 RP11-20E24.1 RP11-210M15.1 RP11-211N11.5 RP11-212I21.4 RP11-214K3.20 RP11-216B9.6 RP11-216L13.19 RP11-21L23.2 RP11-228B15.4 RP11-23J9.5 RP11-242D8.1 RP11-243J18.2 RP11-245G13.2 RP11-255C15.4 RP11-259P20.1 RP11-25D3.1 RP11-265E18.1 RP11-267M23.4 RP11-288G11.3 RP11-288L9.4 RP11-297D21.4 RP11-297N6.4 RP11-298I3.4 RP11-2H3.6 RP11-30L3.2 RP11-313P13.5 RP11-313P22.1 RP11-314P15.2 RP11-315D16.2

dvklopfenstein commented 4 years ago

Those look like human genes. The annotations for human genes are found here:

$ wget ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz
$ gunzip gene2go.gz

But in NCBI's gene2go association file, the genes are referred to by their NCBI Entrez ID, rather than their symbol, which is what you are using. So you will need to convert your gene symbols to NCBI Entrez IDs.

Download this file:

$ wget ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz
$ gunzip Homo_sapiens.gene_info.gz

I wrote a script to convert your gene symbols to NCBI Entrez geneids, to match the ID format in the association, which is attached and outputs this when run:

$ python3 convert_symbol_to_entrez.py
     150 READ: study_symbols.txt
   2,000 READ: pop_symbols.txt
  124,943 READ: Homo_sapiens.gene_info
     115 GeneIDs in study
   1,626 GeneIDs in population
     115 WROTE: study_geneids.txt
   1,626 WROTE: pop_geneids.txt

You can see that you lose a bunch of genes (150-115 in the study, 2,000-1,626 in the population). They have names like: RP11-126K1.2

You will need to look into those.

Then you run an enrichment like this, which for your study set unfortunately returns no significant results:

$ scripts/find_enrichment.py study_geneids.txt pop_geneids.txt gene2go --method=fdr_bh --outfile results.tsv
go-basic.obo: fmt(1.2) rel(2019-10-07) 47,285 GO Terms
HMS:0:00:07.365907 323,107 annotations READ: gene2go
1 taxids stored: 9606
Study: 115 vs. Population 1626

Load BP Gene Ontology Analysis ...
Propagating term counts up: is_a
 74%  1,202 of  1,626 population items found in association

Load CC Gene Ontology Analysis ...
Propagating term counts up: is_a
 78%  1,267 of  1,626 population items found in association

Load MF Gene Ontology Analysis ...
Propagating term counts up: is_a
 74%  1,205 of  1,626 population items found in association

Run BP Gene Ontology Analysis: current study set of 115 IDs ...
 74%     85 of    115 study items found in association
100%    115 of    115 study items found in population(1626)
Calculating 7,464 uncorrected p-values using fisher
   7,464 GO terms are associated with  1,202 of  1,626 population items
   1,928 GO terms are associated with     85 of    115 study items
  METHOD fdr_bh:
       0 GO terms found significant (< 0.05=alpha) (  0 enriched +   0 purified): statsmodels fdr_bh
       0 study items associated with significant GO IDs (enriched)
       0 study items associated with significant GO IDs (purified)

Run CC Gene Ontology Analysis: current study set of 115 IDs ...
 77%     89 of    115 study items found in association
100%    115 of    115 study items found in population(1626)
Calculating 962 uncorrected p-values using fisher
     962 GO terms are associated with  1,267 of  1,626 population items
     280 GO terms are associated with     89 of    115 study items
  METHOD fdr_bh:
       0 GO terms found significant (< 0.05=alpha) (  0 enriched +   0 purified): statsmodels fdr_bh
       0 study items associated with significant GO IDs (enriched)
       0 study items associated with significant GO IDs (purified)

Run MF Gene Ontology Analysis: current study set of 115 IDs ...
 73%     84 of    115 study items found in association
100%    115 of    115 study items found in population(1626)
Calculating 1,616 uncorrected p-values using fisher
   1,616 GO terms are associated with  1,205 of  1,626 population items
     372 GO terms are associated with     84 of    115 study items
  METHOD fdr_bh:
       0 GO terms found significant (< 0.05=alpha) (  0 enriched +   0 purified): statsmodels fdr_bh
       0 study items associated with significant GO IDs (enriched)
       0 study items associated with significant GO IDs (purified)
    127 of 10,042 results have uncorrected P-values <= 0.05=pval

      0 items. NOT WRITING results.tsv

Here is the script. You will need to rename it. Git would not load a file with a .py extension:

mv convert_symbol_to_entrez.py.txt convert_symbol_to_entrez.py

convert_symbol_to_entrez.py.txt

susanGhaderi commented 4 years ago

Thank you so much. Now, it's work perfectly.

Best regards, Susan

dvklopfenstein commented 4 years ago

It is so good to hear you are off and running.

Thank you again for your interest in GOATOOLS and taking the time to contact us.