jupyter-naas / awesome-notebooks

A powerful data & AI notebook templates catalog: prompts, plugins, models, workflow automation, analytics, code snippets - following the IMO framework to be searchable and reusable in any context.
https://naas.ai/search
BSD 3-Clause "New" or "Revised" License
2.69k stars 453 forks source link

SpaCy - Tokenize a text corpus #1400

Open jravenel opened 1 year ago

jravenel commented 1 year ago

Input: plain text

Model: split text into chunk

Output: json

jravenel commented 1 year ago

This will be useful for a data pipeline I want to create, google meet > Youtube.

mukhtarmid commented 1 year ago

@jravenel @Dr0p42 @FlorentLvr, Can I pick this issue!? I'm well aware with SpaCy. Will try my best!

jravenel commented 1 year ago

Of course man, go ahead! @mukhtarmid

jravenel commented 1 year ago

I just assigned it to you on this iteration @mukhtarmid