monarch-initiative / SLDBGen

6 stars 3 forks source link

Synonymous entries in column 'cell.line' #200

Closed hansenp closed 2 years ago

hansenp commented 2 years ago

There seem to be some synonymous entries in the column cell.line. For example, there are 273 rows with HCT116and 42 rows with HCT 116. There are similar cases for HeLa and A549 cell lines.

Possibly the same standard identifiers as used here could be used.

human mammary epithelial cells          383
A-498                                   361
---> HCT116                                  273
      ---> HeLa-Cells                              188
HAP1                                    134
NCI-H82                                 104
HFF-Myc                                 100
786-0                                    89
DLD-1                                    86
A-431                                    59
293T                                     59
            ---> A549                                     57
CAL-51                                   51
U2OS                                     42
---> HCT 116                                  42
RCC4                                     40
      ---> HeLa                                     38
K562 chronic myeloid leukemia cells      30
HCC1143                                  29
MKN45                                    24
102 cancer cell lines                    24
MDA-MB-231                               14
      ---> Hela                                     14
HCC193                                    8
A375                                      8
HEC-59                                    8
H4                                        6
SW620                                     3
            ---> A-549                                     3
PEO1                                      3
MDA-MB-361                                2
primary human foreskin keratinocytes      2
pnrobinson commented 2 years ago

thanks for picking this up, I have revised the parsing and think I have fixed all of this