ikmckenz / target-pred-py

A simple machine learning model for small-molecule target prediction in Python.
GNU General Public License v3.0
18 stars 8 forks source link

Group targets in data cleaning #8

Closed ikmckenz closed 5 years ago

ikmckenz commented 5 years ago

Raw data pulled from ChEMBL has many groups that are very very similar. For example, these are all unique groups:

Voltage-gated potassium channel subunit Kv7.2
Voltage-gated potassium channel subunit Kv7.1
Voltage-gated potassium channel subunit Kv7.5
Voltage-gated potassium channel subunit Kv7.4
Voltage-gated potassium channel subunit Kv7.3
GABA-A receptor; alpha-1/beta-3/gamma-2
GABA-A receptor; alpha-1/beta-2/gamma-2
GABA-A receptor; alpha-2/beta-3/gamma-2
GABA-A receptor; alpha-3/beta-3/gamma-2
GABA-A receptor; alpha-5/beta-3/gamma-2
GABA-A receptor; alpha-6/beta-3/gamma-2

Many of the unique target names should be grouped together in the data cleaning stage.