related-sciences / nxontology-ml

Machine learning to classify ontology nodes
Apache License 2.0
6 stars 0 forks source link

Run model on new ontology version and export labels and features #31

Closed yonromai closed 11 months ago

yonromai commented 11 months ago

@dhimmel @eric-czech This PR adds logic to run the top performing model (Catboost with topological features + PCA64) trained on all the labeled data and run inference on the latest version of the ontology. Exports both precisions and features in tsv files.

Since the files are large, here are the first 10 lines of each:

(nxontology-ml-py3.10) ~/d/nxontology-ml ❯❯❯ head data/efo_otar_slim_v3.57.0_precisions.tsv
identifier  precision   rs_classification   probas
DOID:0050890    low 03-disease-area [0.0014407906509522866, 0.0006906724155990632, 0.9978685369334486]
DOID:10113  medium  02-disease-root [0.0003328976881470062, 0.9979934832943456, 0.0016736190175072594]
DOID:10718  medium  02-disease-root [0.0010466938649724062, 0.9989473243160014, 5.9818190262785945e-06]
DOID:13406  high    01-disease-subtype  [0.9996304204594423, 0.0003694981992668543, 8.134129095350279e-08]
DOID:1947   medium  02-disease-root [0.0002773156934075245, 0.9995922153107071, 0.0001304689958854303]
DOID:7551   medium  02-disease-root [0.001249121647451493, 0.9987508722666981, 6.0858504399845155e-09]
EFO:0000094 high    01-disease-subtype  [0.9998862271438443, 0.00011044795806137522, 3.324898094334069e-06]
EFO:0000095 medium  02-disease-root [0.00144716558257674, 0.9985525698367905, 2.645806325921759e-07]
EFO:0000096 medium  02-disease-root [0.00013778211201424632, 0.999038185154067, 0.0008240327339188221]

(nxontology-ml-py3.10) ~/d/nxontology-ml ❯❯❯ head data/efo_otar_slim_v3.57.0_features.tsv
identifier  prefix  is_gwas_trait   depth   n_ancestors n_descendants   intrinsic_ic    intrinsic_ic_scaled intrinsic_ic_sanchez    intrinsic_ic_sanchez_scaled n_parents   n_roots n_children  n_leaves    xref__doid__count   xref__gard__count   xref__icd10__count  xref__icd9__count   xref__meddra__count xref__mesh__count   xref__mondo__count  xref__ncit__count   xref__omim__count   xref__omimps__count xref__orphanet__count   xref__snomedct__count   xref__umls__count   pca_te_0    pca_te_1    pca_te_2    pca_te_3    pca_te_4    pca_te_5    pca_te_6    pca_te_7    pca_te_8    pca_te_9    pca_te_10   pca_te_11   pca_te_12   pca_te_13   pca_te_14   pca_te_15   pca_te_16   pca_te_17   pca_te_18   pca_te_19   pca_te_20   pca_te_21   pca_te_22   pca_te_23   pca_te_24   pca_te_25   pca_te_26   pca_te_27   pca_te_28   pca_te_29   pca_te_30   pca_te_31   pca_te_32   pca_te_33   pca_te_34   pca_te_35   pca_te_36   pca_te_37   pca_te_38   pca_te_39   pca_te_40   pca_te_41   pca_te_42   pca_te_43   pca_te_44   pca_te_45   pca_te_46   pca_te_47   pca_te_48   pca_te_49   pca_te_50   pca_te_51   pca_te_52   pca_te_53   pca_te_54   pca_te_55   pca_te_56   pca_te_57   pca_te_58   pca_te_59   pca_te_60   pca_te_61   pca_te_62   pca_te_63
DOID:0050890    doid    False   3.0 7.0 8.0 8.055514335632324   0.7948248386383057  9.40687084197998    0.9541544318199158  2.0 2.0 2.0 4.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.5259478092193604 -0.2341548651456833 -1.2303617000579834 -0.832548201084137  -0.25779837369918823    1.1451114416122437  1.3971139192581177  0.4552288353443146  -0.8365649580955505 0.035401538014411926    -1.1192947626113892 0.937861442565918   0.18755336105823517 -0.7050053477287292 -0.30983951687812805    -0.18338429927825928    1.1776199340820312  -0.07195413112640381    -0.4185033142566681 0.22025127708911896 0.5185481905937195  -0.6280885338783264 0.19802698493003845 0.0736229419708252  -0.25899800658226013    0.14767473936080933 0.004609033465385437    0.4775564670562744  -0.06753147393465042    0.24132663011550903 0.17380720376968384 -0.44883906841278076    0.28417322039604187 0.6669296026229858  0.1758679747581482  0.6660868525505066  0.25597524642944336 -0.05610096454620361    0.19686423242092133 0.09164335578680038 0.1921607255935669  -0.05914386734366417    0.12003134191036224 -0.2936997413635254 0.22609537839889526 -0.17498813569545746    -0.38900136947631836    -0.1787423938512802 -0.45468437671661377    0.05296766757965088 0.1699647307395935  0.020933806896209717    -0.40780705213546753    -0.10191752761602402    0.04412025213241577 -0.19061969220638275    0.3419584631919861  0.0071566104888916016   -0.20711877942085266    -0.09473970532417297    0.29953664541244507 0.11031409353017807 -0.35995805263519287    -0.09684310853481293
DOID:10113  doid    False   3.0 4.0 5.0 8.525518417358398   0.8411993384361267  9.299240112304688   0.9432372450828552  1.0 1.0 3.0 3.0 0.0 0.0 0.0 2.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 -4.292662143707275  -0.012413062155246735   1.8242411613464355  1.935800313949585   0.062275536358356476    0.2074545919895172  -1.377741813659668  -0.17499811947345734    0.15652386844158173 -0.457312673330307  0.010869739577174187    0.2978377640247345  -0.43391093611717224    -0.6581587791442871 -0.13335560262203217    0.017675038427114487    0.05749521031975746 -0.5611655712127686 0.00945916399359703 0.2780573070049286  0.1190844178199768  -0.5299111008644104 -0.42235124111175537    0.1749494969844818  0.24978283047676086 0.13225528597831726 0.21575835347175598 0.30982422828674316 -0.15367944538593292    -0.3770029842853546 -0.04630254954099655    0.43010520935058594 0.37766826152801514 -0.19248032569885254    -0.01778222993016243    0.4670114517211914  -0.36021170020103455    -0.4248619079589844 -0.25018757581710815    -0.2650498151779175 -0.17504751682281494    -0.08408670127391815    -0.23379001021385193    0.34530067443847656 -0.2463696151971817 0.001638755202293396    0.02773713506758213 -0.19763803482055664    -0.15010368824005127    0.20505161583423615 -0.031186744570732117   0.01480342447757721 0.0931900218129158  -0.4487842321395874 -0.11721226572990417    -0.08542083203792572    0.05278177186846733 -0.08753959834575653    -0.15053139626979828    0.05876559019088745 -0.08218029141426086    0.03510241210460663 0.20120134949684143 -0.14923544228076935
DOID:10718  doid    False   3.0 8.0 1.0 10.134956359863281  1.0 9.741073608398438   0.9880530834197998  2.0 2.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 2.0 1.0 -2.1711864471435547 2.1476502418518066  0.30292946100234985 2.231956958770752   0.3125019669532776  -1.3254121541976929 -0.5455307960510254 0.024792592972517014    -1.3186321258544922 0.6500191688537598  0.10564020276069641 0.3039182722568512  0.4161166250705719  -0.09473530203104019    0.052719246596097946    0.15168651938438416 0.5001009106636047  -0.5584765672683716 1.0360757112503052  0.10004833340644836 0.5725390315055847  -0.3290960192680359 0.41821566224098206 0.33021053671836853 0.35321861505508423 0.7132424712181091  0.7660133242607117  -0.06765903532505035    -0.32752496004104614    -0.5805914402008057 -0.03971387445926666    -0.24081814289093018    0.3084395229816437  -0.024337418377399445   -0.2978067994117737 0.2931032180786133  -0.12783996760845184    -0.003569338470697403   0.3298843502998352  0.23493990302085876 -0.20128723978996277    -0.03845130652189255    0.26117897033691406 -0.23844003677368164    0.08660334348678589 0.11214448511600494 -0.26975250244140625    -0.037004198879003525   -0.11059493571519852    -0.17546650767326355    -0.11586276441812515    -0.05466967821121216    -0.0943177193403244 -0.007878683507442474   -0.2720102071762085 -0.037379004061222076   -0.1026504635810852 -0.14410409331321716    -0.04642030596733093    0.0017204582691192627   -0.21737045049667358    0.21506285667419434 -0.24276939034461975    -0.5471525192260742
DOID:13406  doid    False   6.0 15.0    1.0 10.134956359863281  1.0 9.794317245483398   0.9934537410736084  2.0 2.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 2.0 1.0 -3.053466320037842  1.9353597164154053  0.7435950636863708  1.2261637449264526  0.3460189998149872  -1.334153413772583  -0.2896428108215332 -1.2940924167633057 -0.9193424582481384 0.28507938981056213 0.5618616342544556  -0.6974371671676636 0.3673700988292694  -0.10952587425708771    0.5655399560928345  0.5718689560890198  -0.04231610894203186    -0.2792418599128723 -0.04745841771364212    -0.5458612442016602 -0.09988245368003845    -0.5230515599250793 0.43793410062789917 0.0662643238902092  0.03926403820514679 -0.017390422523021698   -0.1571381688117981 0.05093233287334442 -0.22762751579284668    0.3091743290424347  0.044325679540634155    0.06438940763473511 0.19343236088752747 0.09189716726541519 0.05267491191625595 0.3791850209236145  -0.2742235064506531 -0.06510186195373535    0.15433621406555176 0.23736712336540222 -0.11754099279642105    0.175717294216156   -0.10771786421537399    -0.13069868087768555    0.020481083542108536    -0.08485361933708191    -0.11563052237033844    -0.2963845133781433 -0.32183653116226196    -0.005954951047897339   0.19083400070667267 0.10765231400728226 -0.03544756770133972    0.00040738843381404877  0.02075100690126419 0.16240248084068298 0.0342591255903244  0.013932555913925171    0.029837846755981445    0.06804774701595306 0.07053413987159729 -0.14218221604824066    -0.11206778883934021    -0.24400101602077484
DOID:1947   doid    False   3.0 4.0 2.0 9.441808700561523   0.931608259677887   9.635712623596191   0.9773662090301514  1.0 1.0 1.0 1.0 0.0 0.0 0.0 3.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 2.0 1.0 -4.358615398406982  0.6257389187812805  0.5706757307052612  0.24123895168304443 -0.042636267840862274   0.4549815058708191  1.0822473764419556  -0.1843743622303009 -0.4928692579269409 -0.9325785636901855 -0.17555435001850128    0.5865316390991211  -0.6012794375419617 0.27047789096832275 -0.05116800218820572    0.2683687210083008  0.7495378255844116  -0.23735180497169495    0.19494569301605225 0.04614569991827011 0.3087760806083679  -0.16968181729316711    -0.32098525762557983    0.09971165657043457 -0.036619894206523895   0.6205041408538818  0.41573765873908997 -0.01911889761686325    -0.07496368139982224    -0.8846541047096252 -0.21723976731300354    0.17012223601341248 -0.050977520644664764   -0.25960153341293335    -0.12032841145992279    0.3294741213321686  -0.3576832711696625 -0.31260448694229126    -0.17089523375034332    -0.0837959498167038 0.07278472930192947 -0.0882568359375    -0.09882014989852905    -0.3583656847476959 -0.39493122696876526    0.2186741977930069  -0.13182365894317627    -0.030524559319019318   -0.2784685492515564 0.2335917055606842  -0.08077721297740936    -0.12147897481918335    -0.283833384513855  -0.033535994589328766   -0.1372685581445694 0.16253313422203064 -0.21930746734142303    -0.10494700819253922    -0.14673498272895813    0.025501996278762817    -0.20294858515262604    -0.0908549576997757 0.07397913187742233 -0.23007738590240479
DOID:7551   doid    False   3.0 10.0    1.0 10.134956359863281  1.0 9.763545989990234   0.9903325438499451  2.0 2.0 0.0 1.0 0.0 1.0 0.0 4.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 3.0 2.0 7.6529765129089355  4.843364715576172   1.202488899230957   3.0342798233032227  1.6187490224838257  -1.618166208267212  -0.6018086075782776 0.13195116817951202 -1.0106501579284668 -0.8107810020446777 -0.47343242168426514    0.7407431602478027  0.327420175075531   0.2433404177427292  -1.2468856573104858 -0.9833382368087769 0.2333955466747284  0.14592911303043365 0.9679390788078308  0.46444687247276306 -0.5032159090042114 -0.825147271156311  0.5328531265258789  0.8762033581733704  0.35657715797424316 0.8831196427345276  0.9349611401557922  0.5793198943138123  -0.12492191791534424    -1.096081256866455  -0.5771995186805725 0.36468735337257385 -0.9633459448814392 -0.22649505734443665    -0.17962542176246643    0.08630876243114471 -0.33165425062179565    -0.6272510290145874 0.3864843547344208  0.5837273597717285  -0.7332669496536255 -0.20665770769119263    0.0027946829795837402   -0.5891962051391602 -0.6142576932907104 0.29745712876319885 0.3065825402736664  -0.19772136211395264    0.5714584589004517  -0.17179793119430542    0.05529561638832092 0.6499223709106445  -0.44650235772132874    0.19575074315071106 0.024430692195892334    -0.42959991097450256    -0.4012017250061035 0.3081108033657074  0.4373941123485565  0.2289842665195465  0.1455610692501068  0.19333839416503906 -0.032675594091415405   -0.17066258192062378
EFO:0000094 efo True    3.0 28.0    2.0 9.441808700561523   0.931608259677887   9.82376480102539    0.9964406490325928  2.0 4.0 1.0 1.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 11.128449440002441  1.0209848880767822  -0.8652999401092529 1.6890690326690674  -0.18352320790290833    0.744799017906189   2.517777681350708   0.6461104154586792  -0.03241172432899475    0.2827366590499878  0.08010733127593994 -0.8458065986633301 -0.02400149405002594    -2.1516690254211426 1.517308235168457   -1.6414929628372192 0.41878214478492737 -0.5412118434906006 -1.0516648292541504 -0.4427112936973572 -0.29992619156837463    -0.7069032788276672 -0.2726719379425049 -0.8692108988761902 0.646533191204071   0.6723355054855347  -0.48478418588638306    0.4093706011772156  1.053286075592041   0.19103139638900757 -0.35528627038002014    -0.0010965019464492798  -0.07091647386550903    0.007891163229942322    -0.0543951578438282 0.5829113125801086  0.3379814624786377  -0.07267516851425171    -0.7161793112754822 1.0698957443237305  -0.014646857976913452   0.25807949900627136 -0.3661832809448242 -0.18554449081420898    -0.4328343868255615 -0.17104116082191467    0.39311569929122925 0.03580936789512634 0.5006163120269775  -0.11892920732498169    -0.12251430004835129    -0.2492581605911255 0.08950775861740112 0.08261418342590332 -0.3407742977142334 0.0736057460308075  -0.08493512868881226    -0.037307098507881165   -0.241159588098526  0.026623010635375977    -0.23924008011817932    -0.058519989252090454   -0.19711123406887054    -0.2002945840358734
EFO:0000095 efo True    4.0 31.0    2.0 9.441808700561523   0.931608259677887   9.827107429504395   0.9967796802520752  4.0 4.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 2.0 4.585427284240723   5.451262474060059   -0.7093328237533569 0.0013941526412963867   -0.2116016447544098 -0.9329427480697632 -0.3998262286186218 -1.3621879816055298 -0.5016341209411621 -0.8915679454803467 -2.2134320735931396 -0.510587215423584  0.3028688430786133  0.7573547959327698  -0.21300050616264343    -0.6568917632102966 0.12794822454452515 -0.32726815342903137    -0.501696765422821  -0.63877272605896   -0.8080993294715881 0.03566819429397583 -0.6334696412086487 -0.3418427109718323 0.20179376006126404 -0.2649763226509094 0.00255424901843071 0.20200230181217194 0.3721511960029602  -0.17793355882167816    -0.47323447465896606    -0.4914860129356384 -0.10061060637235641    0.23275482654571533 -1.1204190254211426 0.5090923309326172  -0.1899888515472412 -0.22541162371635437    -0.23816274106502533    -0.257913738489151  -0.18111132085323334    0.0907231867313385  -0.13169464468955994    0.3287328779697418  -0.10841067135334015    -0.5685760378837585 0.4515128433704376  -0.3628005087375641 0.5605565905570984  -0.12469089776277542    0.15201766788959503 -0.3548508286476135 0.13091860711574554 -0.1744987666606903 -0.24738582968711853    -0.0824233889579773 -0.018049389123916626   0.07260237634181976 0.0790439248085022  0.1232566237449646  0.14359556138515472 -0.140633225440979  -0.007404893636703491   0.16132348775863647
EFO:0000096 efo True    3.0 18.0    39.0    6.4713945388793945  0.638522207736969   8.878026962280273   0.9005128741264343  1.0 4.0 8.0 30.0    1.0 0.0 0.0 1.0 0.0 0.0 1.0 2.0 0.0 0.0 0.0 1.0 1.0 4.7383012771606445  5.625391483306885   0.3204095959663391  -0.604826807975769  1.7347877025604248  -0.6747621893882751 0.4482678771018982  0.5333946347236633  0.5934857726097107  -0.3400038480758667 -2.774846076965332  0.001615479588508606    -0.09772847592830658    0.8044964075088501  0.011978432536125183    -0.18327993154525757    1.4177908897399902  -0.9073247909545898 -1.2891361713409424 -0.555657148361206  -0.9741580486297607 -0.2685509920120239 -0.5042699575424194 -0.695961058139801  0.06538707762956619 -0.6396059393882751 -0.5538358092308044 -0.2695912718772888 0.5331885814666748  -0.5415839552879333 -0.8517720699310303 -0.05365517735481262    0.8653401136398315  0.5677555203437805  -0.28002476692199707    0.6408845782279968  -0.12618784606456757    -0.21589970588684082    -0.10028059780597687    0.5043363571166992  -0.036070212721824646   0.15150779485702515 -0.17870552837848663    0.11406873166561127 -0.16667340695858002    -0.1580265611410141 0.4315616190433502  0.0422755591571331  0.22193358838558197 0.21625356376171112 -0.047561854124069214   -0.35924458503723145    0.3076460063457489  -0.1712191253900528 -0.3843381404876709 -0.43709805607795715    -0.02676708996295929    -0.22483819723129272    -0.020394423976540565   0.2604283392429352  0.2827552258968353  0.21180424094200134 0.3344354033470154  -0.3078382611274719

Here are the export file sizes:

(nxontology-ml-py3.10) ~/d/nxontology-ml ❯❯❯ ls -lah data                                                                                                      
drwxr-xr-x  10 romain  staff   320B Sep 20 19:50 .
drwxr-xr-x  30 romain  staff   960B Sep 20 20:32 ..
-rw-r--r--   1 romain  staff   1.5M Jul 24 15:23 efo_otar_slim_v3.43.0_rs_classification.tsv
-rw-r--r--   1 romain  staff    20M Sep 20 19:50 efo_otar_slim_v3.57.0_features.tsv
-rw-r--r--   1 romain  staff   1.5M Sep 20 19:50 efo_otar_slim_v3.57.0_precisions.tsv

@dhimmel you confirm you'd like the feature file checked in?

yonromai commented 11 months ago

@dhimmel I added a new commit to address the changes that you asked for:

Here are the first 10 lines of the updates output:

identifier  precision   proba_high  proba_medium    proba_low   rs_classification   efo_label   prefix  is_gwas_trait   depth   n_ancestors n_descendants   intrinsic_ic    intrinsic_ic_scaled intrinsic_ic_sanchez    intrinsic_ic_sanchez_scaled n_parents   n_roots n_children  n_leaves    xref__doid__count   xref__gard__count   xref__icd10__count  xref__icd9__count   xref__meddra__count xref__mesh__count   xref__mondo__count  xref__ncit__count   xref__omim__count   xref__omimps__count xref__orphanet__count   xref__snomedct__count   xref__umls__count   pca_te_0    pca_te_1    pca_te_2    pca_te_3    pca_te_4    pca_te_5    pca_te_6    pca_te_7    pca_te_8    pca_te_9    pca_te_10   pca_te_11   pca_te_12   pca_te_13   pca_te_14   pca_te_15   pca_te_16   pca_te_17   pca_te_18   pca_te_19   pca_te_20   pca_te_21   pca_te_22   pca_te_23   pca_te_24   pca_te_25   pca_te_26   pca_te_27   pca_te_28   pca_te_29   pca_te_30   pca_te_31   pca_te_32   pca_te_33   pca_te_34   pca_te_35   pca_te_36   pca_te_37   pca_te_38   pca_te_39   pca_te_40   pca_te_41   pca_te_42   pca_te_43   pca_te_44   pca_te_45   pca_te_46   pca_te_47   pca_te_48   pca_te_49   pca_te_50   pca_te_51   pca_te_52   pca_te_53   pca_te_54   pca_te_55   pca_te_56   pca_te_57   pca_te_58   pca_te_59   pca_te_60   pca_te_61   pca_te_62   pca_te_63
DOID:0050890    low 0.00028 0.00063 0.99909 03-disease-area synucleinopathy doid    False   3.0 7.0 8.0 8.055514335632324   0.7948248386383057  9.40687084197998    0.9541544318199158  2.0 2.0 2.0 4.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 -1.5259480476379395 -0.2341545969247818 -1.2303614616394043 -0.8325576186180115 -0.25777679681777954    1.145113229751587   1.3971177339553833  0.4552121162414551  -0.836563766002655  0.0354347750544548  -1.119293451309204  0.9378551244735718  0.18755923211574554 -0.7050077319145203 -0.3098467290401459 -0.1833728551864624 1.1776245832443237  -0.071957528591156  -0.4185025095939636 0.22024321556091309 0.5185545682907104  -0.6280837059020996 0.19802966713905334 0.07362869381904602 -0.2590044438838959 0.14766231179237366 0.004614487290382385    0.47755593061447144 -0.06752994656562805    0.24133142828941345 0.17379988729953766 -0.4488271474838257 0.28419846296310425 0.6668939590454102  0.17590837180614471 0.6661109328269958  0.2558993995189667  -0.05613671988248825    0.19678303599357605 0.09166446328163147 0.19197209179401398 -0.05938752368092537    0.1201208233833313  -0.2938452363014221 0.22624200582504272 -0.17448432743549347    -0.3885856866836548 -0.17940837144851685    -0.4560741186141968 0.052791789174079895    0.17347118258476257 0.022671625018119812    -0.40328696370124817    -0.10279969871044159    0.03752683103084564 -0.19509345293045044    0.3398292362689972  -0.023118361830711365   -0.1852674037218094 -0.10801161825656891    0.298456072807312   0.16781677305698395 -0.33339065313339233    -0.04248553514480591
DOID:10113  medium  0.00058 0.99845 0.00098 02-disease-root trypanosomiasis doid    False   3.0 4.0 5.0 8.525518417358398   0.8411993384361267  9.299240112304688   0.9432372450828552  1.0 1.0 3.0 3.0 0.0 0.0 0.0 2.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 -4.292663097381592  -0.012410089373588562   1.8242411613464355  1.935802936553955   0.06222572177648544 0.20745515823364258 -1.3777449131011963 -0.17498718202114105    0.15651896595954895 -0.4573137164115906 0.010859286412596703    0.297838419675827   -0.43391120433807373    -0.6581568717956543 -0.13336409628391266    0.017675593495368958    0.05749864503741264 -0.5611717700958252 0.0094582699239254  0.27805519104003906 0.11908496916294098 -0.5299090147018433 -0.4223472476005554 0.17494449019432068 0.2497742772102356  0.13226579129695892 0.21576140820980072 0.3098200857639313  -0.1536838412284851 -0.3770142197608948 -0.0462801419198513 0.4301183521747589  0.37766098976135254 -0.19248944520950317    -0.01779669150710106    0.46696823835372925 -0.3602733016014099 -0.4248644709587097 -0.2502250075340271 -0.26503461599349976    -0.17517341673374176    -0.08410072326660156    -0.2338671088218689 0.3454975187778473  -0.24611784517765045    0.0016789063811302185   0.02755538374185562 -0.1982964724302292 -0.14963236451148987    0.2049160599708557  -0.02968670427799225    0.012656927108764648    0.09475623071193695 -0.44386792182922363    -0.125501811504364  -0.09200803190469742    0.04397750645875931 -0.09944383054971695    -0.14161255955696106    0.05635999143123627 -0.10539209842681885    0.02940986305475235 0.18866941332817078 -0.2048323154449463
DOID:10718  medium  0.00117 0.99883 0.00000 02-disease-root giardiasis  doid    False   3.0 8.0 1.0 10.134956359863281  1.0 9.741073608398438   0.9880530834197998  2.0 2.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 2.0 1.0 -2.1711859703063965 2.1476523876190186  0.3029290437698364  2.2319693565368652  0.3124423027038574  -1.3254098892211914 -0.5455265045166016 0.02479447051882744 -1.3186272382736206 0.6500306129455566  0.10565608739852905 0.30391302704811096 0.4161183536052704  -0.09474004805088043    0.05271919071674347 0.15169045329093933 0.5001036524772644  -0.558479905128479  1.0360777378082275  0.10004164278507233 0.5725386142730713  -0.3290843069553375 0.4182170033454895  0.33021101355552673 0.3531874418258667  0.7132461667060852  0.7660188674926758  -0.06766881793737411    -0.3275296688079834 -0.5805946588516235 -0.039685431867837906   -0.24081352353096008    0.30845287442207336 -0.024329818785190582   -0.29781514406204224    0.2931155264377594  -0.1278868317604065 -0.003562379628419876   0.32983118295669556 0.23495084047317505 -0.2014816403388977 -0.03824345022439957    0.26103734970092773 -0.23837420344352722    0.0867302343249321  0.11267700791358948 -0.2687276005744934 -0.03720066696405411    -0.11061225086450577    -0.1751476675271988 -0.1137891411781311 -0.05317603051662445    -0.09181027114391327    -0.005521506071090698   -0.27813923358917236    -0.03908546268939972    -0.10759022831916809    -0.14243505895137787    -0.007824696600437164   0.014894582331180573    -0.2522224187850952 0.17425143718719482 -0.3088277578353882 -0.4380866289138794
DOID:13406  high    0.99981 0.00019 0.00000 01-disease-subtype  pulmonary sarcoidosis   doid    False   6.0 15.0    1.0 10.134956359863281  1.0 9.794317245483398   0.9934537410736084  2.0 2.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 2.0 1.0 -3.053466320037842  1.9353625774383545  0.7435942888259888  1.2261756658554077  0.3459858298301697  -1.3341552019119263 -0.28965288400650024    -1.294089674949646  -0.9193390607833862 0.2850755453109741  0.5618708729743958  -0.6974412798881531 0.3673611581325531  -0.10952895134687424    0.5655360221862793  0.571872889995575   -0.04232296347618103    -0.2792437672615051 -0.04745875298976898    -0.5458601713180542 -0.09988430142402649    -0.5230472683906555 0.4379386007785797  0.06626390665769577 0.039266280829906464    -0.017388835549354553   -0.15713253617286682    0.0509331040084362  -0.2276216298341751 0.3091808557510376  0.04430576413869858 0.06439712643623352 0.19343256950378418 0.09192248433828354 0.052678607404232025    0.3791302442550659  -0.2742898166179657 -0.06509727239608765    0.15427455306053162 0.23731906712055206 -0.11749233305454254    0.17581462860107422 -0.10765983909368515    -0.13087287545204163    0.02043258398771286 -0.08507407456636429    -0.11625076085329056    -0.2965066730976105 -0.3207240104675293 -0.00482650101184845    0.1927568018436432  0.10474809259176254 -0.04025161266326904    -0.0002699242904782295  0.031086549162864685    0.16730231046676636 0.03486368805170059 0.03765247017145157 0.03559812903404236 0.07144510000944138 0.11908143758773804 -0.14880043268203735    -0.12750791013240814    -0.1889151930809021
DOID:1947   medium  0.00023 0.99966 0.00010 02-disease-root trichomoniasis  doid    False   3.0 4.0 2.0 9.441808700561523   0.931608259677887   9.635712623596191   0.9773662090301514  1.0 1.0 1.0 1.0 0.0 0.0 0.0 3.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0 2.0 1.0 -4.358615875244141  0.6257417798042297  0.5706757307052612  0.24123653769493103 -0.042641766369342804   0.45498359203338623 1.082244634628296   -0.1843864917755127 -0.4928770959377289 -0.9325709342956543 -0.17557352781295776    0.586540699005127   -0.601270854473114  0.2704795002937317  -0.05116390436887741    0.2683708667755127  0.7495367527008057  -0.23735424876213074    0.19494563341140747 0.04613890498876572 0.3087756931781769  -0.16967998445034027    -0.3209839165210724 0.09971511363983154 -0.03665013611316681    0.6205025315284729  0.4157359004020691  -0.01912776380777359    -0.07497791945934296    -0.8846695423126221 -0.21718677878379822    0.17011749744415283 -0.05097880959510803    -0.25961947441101074    -0.1203642189502716 0.32943180203437805 -0.3577299118041992 -0.31262314319610596    -0.1708638072013855 -0.08372326195240021    0.07257656753063202 -0.08842091262340546    -0.09863071143627167    -0.3582797646522522 -0.3948209285736084 0.2187349796295166  -0.13157141208648682    -0.03057733178138733    -0.27927345037460327    0.23326550424098969 -0.08040181547403336    -0.12052103877067566    -0.2778048813343048 -0.027752041816711426   -0.15173453092575073    0.1490785926580429  -0.2247905433177948 -0.1218884214758873 -0.14009954035282135    0.027499020099639893    -0.19819411635398865    -0.07656015455722809    0.0010087201371788979   -0.28624898195266724
DOID:7551   medium  0.00105 0.99895 0.00000 02-disease-root gonorrhea   doid    False   3.0 10.0    1.0 10.134956359863281  1.0 9.763545989990234   0.9903325438499451  2.0 2.0 0.0 1.0 0.0 1.0 0.0 4.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 3.0 2.0 7.652980804443359   4.843362808227539   1.2024866342544556  3.034327268600464   1.6186671257019043  -1.6181623935699463 -0.6018027663230896 0.13195493817329407 -1.0106589794158936 -0.8107593655586243 -0.4734506905078888 0.7407383918762207  0.32743096351623535 0.243345245718956   -1.2468758821487427 -0.9833461046218872 0.23341131210327148 0.14591965079307556 0.9679398536682129  0.4644564092159271  -0.5032130479812622 -0.8251446485519409 0.5328627228736877  0.8762052059173584  0.35654217004776    0.8831218481063843  0.9349535703659058  0.5793136954307556  -0.12494811415672302    -1.0961215496063232 -0.5771422982215881 0.3646371066570282  -0.9633585810661316 -0.22651579976081848    -0.17963209748268127    0.0863700583577156  -0.33167973160743713    -0.6272316575050354 0.3866034746170044  0.5838128328323364  -0.7333778142929077 -0.2068389654159546 0.0029905885457992554   -0.5885534286499023 -0.6146354079246521 0.2973923683166504  0.30839040875434875 -0.19612199068069458    0.5697736144065857  -0.1719053089618683 0.04697195440530777 0.6568117737770081  -0.43808215856552124    0.19207635521888733 -0.006294466555118561   -0.44620275497436523    -0.4011954367160797 0.32881081104278564 0.3663390278816223  0.2286163866519928  0.05143958330154419 0.3333771526813507  -0.08463650941848755    -0.16844049096107483
EFO:0000094 high    0.99944 0.00056 0.00000 01-disease-subtype  B-cell acute lymphoblastic leukemia efo True    3.0 28.0    2.0 9.441808700561523   0.931608259677887   9.82376480102539    0.9964406490325928  2.0 4.0 1.0 1.0 2.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 0.0 11.128454208374023  1.0209791660308838  -0.8652997016906738 1.689063549041748   -0.1835705041885376 0.7448073029518127  2.5177831649780273  0.6460842490196228  -0.03240668773651123    0.2827354073524475  0.0801127552986145  -0.8458119034767151 -0.02401795983314514    -2.1516900062561035 1.5172958374023438  -1.6414721012115479 0.41878771781921387 -0.5412120819091797 -1.051668643951416  -0.44271165132522583    -0.2999247908592224 -0.7069023847579956 -0.2726597785949707 -0.8692215085029602 0.6464934349060059  0.6723757982254028  -0.48477596044540405    0.4093879461288452  1.0532820224761963  0.19099655747413635 -0.355294406414032  -0.001095995306968689   -0.07091590762138367    0.007893219590187073    -0.05441289395093918    0.582928478717804   0.3379111886024475  -0.07260769605636597    -0.7162529230117798 1.0698399543762207  -0.014526605606079102   0.2582075893878937  -0.3666952848434448 -0.1856209933757782 -0.4324628412723541 -0.1706363558769226 0.39273953437805176 0.035026952624320984    0.5012767910957336  -0.1212281882762909 -0.1197638288140297 -0.24914872646331787    0.08537015318870544 0.0732237696647644  -0.3219563364982605 0.09170031547546387 -0.07978761196136475    -0.054105792194604874   -0.20619694888591766    0.02372048795223236 -0.17961224913597107    -0.15795671939849854    -0.2573228180408478 -0.20552240312099457
EFO:0000095 medium  0.00088 0.99912 0.00000 02-disease-root chronic lymphocytic leukemia    efo True    4.0 31.0    2.0 9.441808700561523   0.931608259677887   9.827107429504395   0.9967796802520752  4.0 4.0 1.0 1.0 1.0 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 0.0 2.0 4.58543062210083    5.451262474060059   -0.709334135055542  0.0013892650604248047   -0.2116018384695053 -0.9329459071159363 -0.39983832836151123    -1.362183928489685  -0.5016438364982605 -0.8915143609046936 -2.2134506702423096 -0.5105847120285034 0.3028690218925476  0.7573581337928772  -0.21298396587371826    -0.6568986773490906 0.12795089185237885 -0.32726234197616577    -0.5017027854919434 -0.6387689113616943 -0.808106005191803  0.035661935806274414    -0.6334644556045532 -0.341848760843277  0.20179647207260132 -0.2649608850479126 0.0025481805205345154   0.20200304687023163 0.37214332818984985 -0.177973672747612  -0.4732191562652588 -0.4914882779121399 -0.1005815863609314 0.23284175992012024 -1.1204171180725098 0.509072482585907   -0.19007283449172974    -0.2253798097372055 -0.23807118833065033    -0.2579413056373596 -0.18090595304965973    0.09062853455543518 -0.13168823719024658    0.32868361473083496 -0.10866905748844147    -0.5691341757774353 0.4499610960483551  -0.3624628484249115 0.5611351728439331  -0.1262049376964569 0.14938515424728394 -0.35689884424209595    0.13040611147880554 -0.17435169219970703    -0.24893386662006378    -0.07152017951011658    -0.007178366184234619   0.09548397362232208 0.045310989022254944    0.11406274139881134 0.19890078902244568 -0.08418260514736176    -0.042736977338790894   0.08541814982891083
EFO:0000096 medium  0.00054 0.99822 0.00125 02-disease-root neoplasm of mature B-cells  efo True    3.0 18.0    39.0    6.4713945388793945  0.638522207736969   8.878026962280273   0.9005128741264343  1.0 4.0 8.0 30.0    1.0 0.0 0.0 1.0 0.0 0.0 1.0 2.0 0.0 0.0 0.0 1.0 1.0 4.738305568695068   5.625391006469727   0.3204062581062317  -0.6047797203063965 1.7348039150238037  -0.6747610569000244 0.44827455282211304 0.5333905816078186  0.5934811234474182  -0.33994847536087036    -2.7748541831970215 0.0016219764947891235   -0.0977252647280693 0.80450040102005    0.01199561357498169 -0.1832766830921173 1.4177908897399902  -0.9073195457458496 -1.289145588874817  -0.5556608438491821 -0.9741578102111816 -0.2685505449771881 -0.504262387752533  -0.6959699988365173 0.065402090549469   -0.6395871043205261 -0.5538464784622192 -0.2695954740047455 0.5331799387931824  -0.5416421890258789 -0.8517335057258606 -0.05361565947532654    0.8653509616851807  0.5677962899208069  -0.27996766567230225    0.6408759355545044  -0.1262664794921875 -0.21582552790641785    -0.10029469430446625    0.5043128132820129  -0.03579314053058624    0.15132993459701538 -0.17875544726848602    0.11429625004529953 -0.16657105088233948    -0.15872012078762054    0.4313417971134186  0.04224897921085358 0.22212675213813782 0.2161094844341278  -0.051062680780887604   -0.36110615730285645    0.3048466742038727  -0.17278432846069336    -0.3914314806461334 -0.41992291808128357    -0.01219351589679718    -0.21843977272510529    -0.02015511691570282    0.23409664630889893 0.23764610290527344 0.2660714089870453  0.31361186504364014 -0.37614133954048157

Note: I'll squash the commit to erase history prior to merging

dhimmel commented 11 months ago

Here are the first 10 lines of the updates output

Interesting, some floats still look verbose. Not a huge deal if we can't get float_format to work.

I'll squash the commit to erase history prior to merging

Squash merging should be sufficient I think, no need to do locally. Branch gets deleted after the merge.

yonromai commented 11 months ago

Interesting, some floats still look verbose. Not a huge deal if we can't get float_format to work.

Right, good catch. It's surprising how some floats got formatted but not all. I'll take a look at it would save a little on file size.

Squash merging should be sufficient I think

:+1:

yonromai commented 11 months ago

Right, good catch. It's surprising how some floats got formatted but not all. I'll take a look at it would save a little on file size.

Fixed ^ in the last push (using convert_dtypes, nice find @dhimmel )

Top 10 lines now:

identifier  precision   proba_high  proba_medium    proba_low   rs_classification   efo_label   prefix  is_gwas_trait   depth   n_ancestors n_descendants   intrinsic_ic    intrinsic_ic_scaled intrinsic_ic_sanchez    intrinsic_ic_sanchez_scaled n_parents   n_roots n_children  n_leaves    xref__doid__count   xref__gard__count   xref__icd10__count  xref__icd9__count   xref__meddra__count xref__mesh__count   xref__mondo__count  xref__ncit__count   xref__omim__count   xref__omimps__count xref__orphanet__count   xref__snomedct__count   xref__umls__count   pca_te_0    pca_te_1    pca_te_2    pca_te_3    pca_te_4    pca_te_5    pca_te_6    pca_te_7    pca_te_8    pca_te_9    pca_te_10   pca_te_11   pca_te_12   pca_te_13   pca_te_14   pca_te_15   pca_te_16   pca_te_17   pca_te_18   pca_te_19   pca_te_20   pca_te_21   pca_te_22   pca_te_23   pca_te_24   pca_te_25   pca_te_26   pca_te_27   pca_te_28   pca_te_29   pca_te_30   pca_te_31   pca_te_32   pca_te_33   pca_te_34   pca_te_35   pca_te_36   pca_te_37   pca_te_38   pca_te_39   pca_te_40   pca_te_41   pca_te_42   pca_te_43   pca_te_44   pca_te_45   pca_te_46   pca_te_47   pca_te_48   pca_te_49   pca_te_50   pca_te_51   pca_te_52   pca_te_53   pca_te_54   pca_te_55   pca_te_56   pca_te_57   pca_te_58   pca_te_59   pca_te_60   pca_te_61   pca_te_62   pca_te_63
DOID:0050890    low 0.00093254  0.00035292  0.99871 03-disease-area synucleinopathy doid    False   3   7   8   8.0555  0.79482 9.4069  0.95415 2   2   2   4   0   0   0   0   0   1   1   0   0   0   0   0   1   -1.5259 -0.23416    -1.2304 -0.83256    -0.25777    1.1451  1.3971  0.45521 -0.83657    0.035402    -1.1193 0.93785 0.18754 -0.70501    -0.30986    -0.18339    1.1776  -0.071934   -0.4185 0.22026 0.51855 -0.62808    0.19805 0.073615    -0.25903    0.14761 0.004626    0.47755 -0.067554   0.24133 0.1738  -0.44884    0.28417 0.66689 0.17593 0.66609 0.25595 -0.056128   0.19684 0.091684    0.19224 -0.059334   0.12009 -0.29377    0.22616 -0.17504    -0.38881    -0.17914    -0.45509    0.052302    0.16953 0.020416    -0.40884    -0.10488    0.038316    -0.19511    0.34087 0.0041448   -0.2176 -0.10685    0.3011  0.19759 -0.33105    -0.10116
DOID:10113  medium  0.00056487  0.99829 0.001141    02-disease-root trypanosomiasis doid    False   3   4   5   8.5255  0.8412  9.2992  0.94324 1   1   3   3   0   0   0   2   0   1   1   0   0   0   0   1   1   -4.2927 -0.012409   1.8242  1.9358  0.06221 0.20745 -1.3777 -0.17499    0.15652 -0.45731    0.010871    0.29784 -0.43393    -0.65814    -0.13337    0.017674    0.057508    -0.56117    0.0094563   0.27806 0.11908 -0.52992    -0.42233    0.17494 0.24975 0.13232 0.21576 0.30981 -0.1537 -0.37701    -0.046296   0.4301  0.37768 -0.19247    -0.017783   0.46699 -0.36022    -0.42488    -0.25018    -0.26505    -0.17507    -0.083977   -0.23389    0.34542 -0.24645    0.0016976   0.028001    -0.19759    -0.15024    0.20434 -0.032703   0.014206    0.093413    -0.44536    -0.12246    -0.087519   0.053074    -0.08962    -0.15297    0.045897    -0.093867   0.039216    0.21087 -0.15763
DOID:10718  medium  0.0012826   0.99864 7.2945e-05  02-disease-root giardiasis  doid    False   3   8   1   10.135  1   9.7411  0.98805 2   2   0   1   0   0   0   1   0   1   1   0   0   0   0   2   1   -2.1712 2.1477  0.30292 2.232   0.31242 -1.3254 -0.54553    0.024786    -1.3186 0.65003 0.10564 0.30391 0.41612 -0.094753   0.052711    0.15168 0.50011 -0.55848    1.0361  0.10005 0.57254 -0.32908    0.41823 0.33019 0.35304 0.71332 0.76601 -0.067692   -0.32753    -0.5806 -0.039709   -0.24083    0.30844 -0.024331   -0.29778    0.2931  -0.12784    -0.0035215  0.32994 0.23504 -0.20127    -0.038089   0.26153 -0.2384 0.086324    0.11208 -0.26877    -0.037157   -0.11064    -0.17566    -0.11797    -0.054032   -0.092926   -0.0023528  -0.28574    -0.042844   -0.11242    -0.12217    -0.058572   0.003556    -0.23765    0.14262 -0.25846    -0.48124
DOID:13406  high    0.99984 0.00015597  1.0226e-07  01-disease-subtype  pulmonary sarcoidosis   doid    False   6   15  1   10.135  1   9.7943  0.99345 2   2   0   1   0   0   0   1   0   1   1   1   0   0   0   2   1   -3.0535 1.9354  0.74359 1.2262  0.34597 -1.3342 -0.28965    -1.2941 -0.91933    0.28509 0.56187 -0.69744    0.36737 -0.10955    0.56553 0.57187 -0.042317   -0.27924    -0.047463   -0.54586    -0.099872   -0.52304    0.43795 0.066247    0.039269    -0.017382   -0.15713    0.05092 -0.22763    0.30918 0.044318    0.06438 0.19343 0.091892    0.052718    0.37916 -0.27421    -0.06512    0.15434 0.23744 -0.11737    0.17613 -0.10756    -0.13096    0.020084    -0.084787   -0.11527    -0.29659    -0.32255    -0.0092712  0.19102 0.10606 -0.03741    0.011247    0.021643    0.16582 0.029547    0.030767    0.053846    0.06838 0.121   -0.13137    -0.11696    -0.18361
DOID:1947   medium  7.7861e-05  0.99985 7.066e-05   02-disease-root trichomoniasis  doid    False   3   4   2   9.4418  0.93161 9.6357  0.97737 1   1   1   1   0   0   0   3   0   1   1   1   0   0   0   2   1   -4.3586 0.62574 0.57067 0.24124 -0.042643   0.45498 1.0822  -0.18438    -0.49288    -0.93258    -0.17555    0.58654 -0.60126    0.27049 -0.05115    0.26838 0.74954 -0.23734    0.19494 0.046145    0.30878 -0.16969    -0.32098    0.099715    -0.036773   0.6205  0.41573 -0.01913    -0.074971   -0.88467    -0.21722    0.17013 -0.050962   -0.2596 -0.12037    0.32947 -0.3577 -0.31258    -0.17085    -0.083796   0.072687    -0.088522   -0.098938   -0.35801    -0.39478    0.21852 -0.13262    -0.030014   -0.27808    0.23588 -0.082757   -0.12044    -0.28082    -0.040695   -0.14838    0.15446 -0.21649    -0.11045    -0.16968    0.023799    -0.16846    -0.082827   0.066082    -0.25304
DOID:7551   medium  0.0012944   0.99871 5.4699e-09  02-disease-root gonorrhea   doid    False   3   10  1   10.135  1   9.7635  0.99033 2   2   0   1   0   1   0   4   0   1   1   1   0   0   1   3   2   7.653   4.8434  1.2025  3.0343  1.6186  -1.6182 -0.60181    0.13195 -1.0107 -0.81077    -0.47342    0.74074 0.32741 0.24337 -1.2469 -0.98335    0.23341 0.14592 0.96794 0.46444 -0.50321    -0.82514    0.5329  0.87618 0.35637 0.8832  0.93496 0.5793  -0.12497    -1.0961 -0.57719    0.3647  -0.96333    -0.22651    -0.17969    0.086365    -0.33169    -0.62713    0.3866  0.58373 -0.73356    -0.20722    0.0028579   -0.58841    -0.6139 0.29656 0.30643 -0.19723    0.5725  -0.16997    0.05256 0.65295 -0.4405 0.18417 0.0015717   -0.45146    -0.40878    0.32704 0.38567 0.21539 0.050819    0.33751 0.017643    -0.19848
EFO:0000094 high    0.99922 0.00073459  4.0677e-05  01-disease-subtype  B-cell acute lymphoblastic leukemia efo True    3   28  2   9.4418  0.93161 9.8238  0.99644 2   4   1   1   2   0   0   0   0   0   1   1   0   0   0   1   0   11.128  1.021   -0.8653 1.6891  -0.18358    0.7448  2.5178  0.64609 -0.032413   0.28274 0.080104    -0.84583    -0.024075   -2.1517 1.5172  -1.6415 0.4188  -0.5412 -1.0517 -0.44271    -0.29992    -0.7069 -0.27265    -0.86924    0.64635 0.67251 -0.48477    0.40944 1.0533  0.19102 -0.35529    -0.001094   -0.070926   0.0078777   -0.054386   0.58287 0.33798 -0.072675   -0.71621    1.0699  -0.014512   0.25846 -0.36599    -0.18599    -0.43291    -0.17089    0.39381 0.034798    0.50026 -0.12069    -0.12005    -0.25027    0.085056    0.077729    -0.33189    0.087196    -0.088243   -0.024413   -0.21181    0.041453    -0.16567    -0.18489    -0.24415    -0.24093
EFO:0000095 medium  0.00089337  0.99911 3.4966e-09  02-disease-root chronic lymphocytic leukemia    efo True    4   31  2   9.4418  0.93161 9.8271  0.99678 4   4   1   1   1   1   2   1   1   1   1   1   1   0   1   0   2   4.5854  5.4513  -0.70934    0.0013847   -0.2116 -0.93294    -0.39983    -1.3622 -0.50164    -0.89158    -2.2134 -0.51059    0.30288 0.75736 -0.21298    -0.65689    0.12796 -0.32726    -0.50171    -0.63878    -0.80809    0.035644    -0.63347    -0.34184    0.20185 -0.26492    0.0025463   0.20202 0.37214 -0.17795    -0.47322    -0.49149    -0.10062    0.23284 -1.1204 0.50912 -0.19001    -0.22539    -0.23815    -0.25802    -0.18117    0.09039 -0.13205    0.32876 -0.10839    -0.56848    0.45003 -0.36222    0.56082 -0.12581    0.15213 -0.35511    0.13484 -0.18291    -0.24112    -0.067584   -0.0060349  0.078428    0.084495    0.12397 0.17471 -0.064032   -0.011173   0.11933
EFO:0000096 medium  0.00066393  0.99833 0.0010048   02-disease-root neoplasm of mature B-cells  efo True    3   18  39  6.4714  0.63852 8.878   0.90051 1   4   8   30  1   0   0   1   0   0   1   2   0   0   0   1   1   4.7383  5.6254  0.3204  -0.60477    1.7348  -0.67476    0.44827 0.5334  0.59347 -0.34003    -2.7748 0.0016211   -0.097706   0.8045  0.012011    -0.18326    1.4178  -0.90728    -1.2892 -0.55566    -0.97415    -0.26856    -0.50427    -0.69597    0.065537    -0.63958    -0.55385    -0.26956    0.5332  -0.54162    -0.85174    -0.053674   0.86533 0.56777 -0.27995    0.64088 -0.12622    -0.21593    -0.10032    0.5044  -0.035795   0.1515  -0.17864    0.11424 -0.1668 -0.15824    0.43227 0.041674    0.22118 0.21532 -0.04722    -0.36042    0.30506 -0.17032    -0.39657    -0.42843    -0.024295   -0.22153    -0.024354   0.2483  0.21122 0.30704 0.28381 -0.38331

Also the file went from 25MB down to 10MB 💯 :

(nxontology-ml-py3.10) ~/d/nxontology-ml ❯❯❯ ls -lah data
drwxr-xr-x   9 romain  staff   288B Sep 21 15:12 .
drwxr-xr-x  30 romain  staff   960B Sep 21 15:15 ..
-rw-r--r--   1 romain  staff   1.5M Sep 21 14:38 efo_otar_slim_v3.43.0_rs_classification.tsv
-rw-r--r--   1 romain  staff    10M Sep 21 15:12 efo_otar_slim_v3.57.0_precisions.tsv