mratsim / Arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
https://mratsim.github.io/Arraymancer/
Apache License 2.0
1.33k stars 96 forks source link

Add NLP dataset + NLP example #315

Open mratsim opened 5 years ago

mratsim commented 5 years ago

We have an embedding layer (#312), we have GRU with sequence support (#283).

We miss a dataset and an NLP example. The IMDB dataset is probably the one to have first: http://ai.stanford.edu/~amaas/data/sentiment/

Alternatively, we can use character level RNN instead of word level RNN which avoids the tokenizer issue (#316).

metasyn commented 5 years ago

Hi Mamy~!

What kind of example are you looking for? I'm pretty interested in helping with this. Could you provide any more details on what you envision out of this?

Related, I made a naive hashing vectorizer implementation for a nim demo at work - might also be somewhat related - https://github.com/metasyn/nim-vectorizer-splunk/tree/master/src - of course, using arraymancer.

mratsim commented 5 years ago

It can be Sentiment analysis on imdb (positive/negative) like https://www.kaggle.com/c/word2vec-nlp-tutorial.

Or for example author of short snippet detection: https://www.kaggle.com/c/spooky-author-identification.

I.e. something short, ideally the tokenizer can just be splitWhitespace.

On the tasks to implement this:

metasyn commented 5 years ago

Dataset + Downloader = https://github.com/mratsim/Arraymancer/pull/317