quan-possible / med-text

Classifying medical text.
Apache License 2.0
0 stars 0 forks source link

hallmark of cancer #3

Closed shaoxiongji closed 3 years ago

shaoxiongji commented 3 years ago

Use this processed data for HoC classification https://github.com/cambridgeltl/cancer-hallmark-cnn

It converts multi-label classification into 10 binary classification sets.

quan-possible commented 3 years ago

Is there anything else about it besides being 10 binary classification sets?

shaoxiongji commented 3 years ago

There are four subfolders in the data folders. doc-10-class should be the target one. I don't understand well for others. But oversampled one should be balanced using the over-sampling technique.

quan-possible commented 3 years ago

But we should just use everything as a multilabel classification problem right? No need to do anything else.

shaoxiongji commented 3 years ago

Defining the problem as either multilabel or many binary classification tasks is okay. The multilabel setting uses sigmoid activation to generate logits, which is a form of binary classification to some extent.

quan-possible commented 3 years ago

I just implemented code for processing the new data and .csv files that I got from it. Please check it out.

shaoxiongji commented 3 years ago

It looks good.

quan-possible commented 3 years ago
shaoxiongji commented 3 years ago

For the last bullet point, you're right about this. But my question is not about this. Anyway, it works. Please proceed to the modeling part.