keras-team / autokeras

AutoML library for deep learning
http://autokeras.com/
Apache License 2.0
9.13k stars 1.4k forks source link

TextClassifier #35

Closed secsilm closed 5 years ago

secsilm commented 6 years ago

I see ImageClassifier in the doc. Is there something like TextClassifier to process text?

davidlenz commented 6 years ago

+1 would also be interested in auto text classifier

kt1004 commented 6 years ago

I also interested in text classifier.

HalaKuwatly commented 6 years ago

same here!

haifeng-jin commented 6 years ago

Thank you for the suggestion. We are working on this feature now. It would be online soon.

Thanks.

haifeng-jin commented 6 years ago

Suggested Name

TextClassifier

Task Description

Text classification. The input is some strings. Each string is an article, a sentence, a paragraph in English. The output is a single class label for each string.

Evaluation Metrics

Accuracy

Benchmark Datasets

Penn Treebank

Reason

NA

Solution

NA

Additional Context

NA

haifeng-jin commented 6 years ago

@boyuangong Please write a baseline for the TextClassifier. It needs to extend from the supervised.py. And reuse the class of ModelTrainer in utils.py. Currently, it doesn't need to search for multiple architectures. Just use a default architecture for now.

We can have a meeting to discuss if you have questions. Thanks.

ddofer commented 6 years ago

Re: TextClassifier benchmarks: I would suggest starting with a document/sentence leve lclassification model (not word level/many to many). e.g. IMDB sentiment. (Common benchmark for sentiment classification of short documents. Used to benchmark ULMFit amongst others). https://github.com/keras-team/keras/blob/master/keras/datasets/imdb.py http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html

haifeng-jin commented 6 years ago

@ddofer Thank you for your suggestions.

ziyadi commented 6 years ago

When do you expect to release the initial TextClassifier model?

boyuangong commented 6 years ago

Hi,

The model is expected to be released around next week. I am in a vacation now back to my country. Sorry for any inconvenience.

-- Best Boyuan Gong

On Aug 29, 2018 at 7:26 AM, <ziyadi (mailto:notifications@github.com)> wrote:

When do you expect to release the initial TextClassifier model?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub (https://github.com/jhfjhfj1/autokeras/issues/35#issuecomment-416772617), or mute the thread (https://github.com/notifications/unsubscribe-auth/AV8DUJCF3XBzv6TUhnV81kYGnVNPWb7Nks5uVdGagaJpZM4VtU_w).

mukhal commented 6 years ago

@boyuangong I can help you out with that module if that's okay.

boyuangong commented 6 years ago

Hi there,

Thanks a lot. I will send the pull request of the newst version tonight when I come back home. The model is pretty much complete now. Only have some run out of memory problem. It would be great to receive help from you.

-- Best Boyuan Gong

On Sep 15, 2018 at 1:56 PM, <Muhammad (mailto:notifications@github.com)> wrote:

@boyuangong (https://github.com/boyuangong) I can help you out with that module if that's okay.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub (https://github.com/jhfjhfj1/autokeras/issues/35#issuecomment-421616659), or mute the thread (https://github.com/notifications/unsubscribe-auth/AV8DUPG7diI91B-wkADAb3aMMEJJB4w5ks5ubU1ogaJpZM4VtU_w).

DGaffney commented 6 years ago

@boyuangong is this live yet? Is there a branch where the code lives currently? Very excited to see your work!

boyuangong commented 6 years ago

@boyuangong is this live yet? Is there a branch where the code lives currently? Very excited to see your work!

Thanks for your attention to our package! Now it’s moved to [WIP] textClassifier WIP #199. You can switch the branch to textClassifier for the latest code. It’s working now. But may need carefully choose the size of the input(you can customize the input length in the constant.py) to prevent the out of memory issue. I am currently waiting for code review and also working on the test coverage before merge it into the master branch.

Best, Boyuan

DGaffney commented 5 years ago

Woooooo!

renaudham commented 5 years ago

Hi. Great, what is the status?

Some questions -What is the max length of each texts we want to classify? For example can, we proceed an average email ? -How many classes can we feed in the engine, so it could stay relatively average with not too much false positive ? 20, 50? I know many old classifier drop a lot in accuracy if there is too many classes. But recent one like the openSource from Facebook can stand really more. But I don't want to use anything from facebook.

-What is the format to feed for training and then testing ? I didn't see any docs and tutorials yet, is there?

Thanks, great job.

boyuangong commented 5 years ago

@renaudham Hi, thanks for your interesting in our work. We are preparing the tutorial documents for some modules including the textClassifier. You can directly check this in the new pull request 341. The text tutorial are in text.md.

Also, if you want to try the textClassifier. You can go and check the example in examples/text_cnn.

Please feel free to let me know if you have further questions.