reimandlab / ActiveDriverDB

ActiveDriverDB
GNU Lesser General Public License v2.1
12 stars 3 forks source link

TCGA mutations duplicated due to semicolons #43

Closed reimand0 closed 7 years ago

reimand0 commented 7 years ago

Some TCGA mutations are currently duplicated. This is because Annovar aggregates effects of duplicated mutations in input and paste these together with semicolons.

To solve this issue, we need to separate impacts by semicolon and take only the first string.

Unique patient-mutation pairs will remain the same when considering the comments field with TCGA barcode, as far as I understand.

reimand0 commented 7 years ago

There is a strange value in the mutations table after this bug emerged:

built-in method count of InstrumentedList object at 0x7fd9fc27f8b8