superfly / fly

Deploy app servers close to your users. Package your app as a Docker image, and launch it in 17 cities with one simple CLI.
https://fly.io
985 stars 48 forks source link

Text summarizer (based on BERT) as a service/API on fly #292

Open geshan opened 4 years ago

geshan commented 4 years ago

I was playing around with an open source summarizer for creating executive sumamry of a given text. It is a generalization of a solution based on a paper which uses BERT (by Google) to summarize lectures. Basically give it something that is around 2k words and ask it to make it 20% it will come back with sentences it thinks are important which would make it around 400+ words (roughly 20% depending on sentence lengths).

It has multiple use cases like creating a news summary service (something to summarize all the Corona news for instance) or summarize any long text you need to read with a ML algo.

As it has a docker container with the project which gives out a REST API with Flask, I can quickly build that and make it work on fly.io with fly specific instructions on how to do it. I think this will be a good addition to the examples.

From my previous experience, this 3.5 GB container needs a lot of resources (given the ML model it uses). It needs like 2 GB of RAM to run, just a heads up. On the bright side, this translates to doc on how to scale up services for a heavy and useful application. Thanks!


PS: I am not a Machine Learning enthusiast, I had to solve a problem for a side project and basic googling landed me to this project. I even evaluated Meaning Cloud API but this repo was better at summarizing and less cost with virtually no limit on number of calls :).

mrkurt commented 4 years ago

I like this a lot. We have a lot of people who've done TensorFlow apps for doing quick predictions to detect things like bots. That stuff is always heavy on the CPU, it sounds like an interesting thing to show people.

We can give you some credits for experimenting + running this on Fly while you build the example out, if it's helpful.

geshan commented 4 years ago

hey @mrkurt , let me know what you think of this: https://github.com/geshan/bert-extractive-summarizer , you can play around on this URL: https://summarizer.fly.dev/ , here is a quick Curl to try out : https://gist.github.com/geshan/0aba03355dc987892b3aa16f87f6eb0b . Let me know your thoughts, thanks!

mrkurt commented 4 years ago

@geshan Sorry for the slow response here, @codepope or I will work through this one this week.

geshan commented 4 years ago

@mrkurt its totally fine, I am working on the js-renderer-fly where @codepope 's comments have been very helpful, thanks!

rizqventures commented 4 years ago

Hey All,

I would love to try out the generalizer summarizing tool mrkurt made...

Get Outlook for Androidhttps://aka.ms/ghei36


From: Geshan Manandhar notifications@github.com Sent: Tuesday, July 28, 2020 7:39:46 PM To: superfly/fly fly@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [superfly/fly] Text summarizer (based on BERT) as a service/API on fly (#292)

@mrkurthttps://github.com/mrkurt its totally fine, I am working on the js-renderer-fly where @codepopehttps://github.com/codepope 's comments have been very helpful, thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/superfly/fly/issues/292#issuecomment-665348552, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMQKMLERCS3H7A6IFFA54TDR55OUFANCNFSM4PFFJCRQ.

codepope commented 4 years ago

Hi @geshan,

Looking over the readme, I think you might get better flow by opening with ...

I am not a Machine Learning (ML) enthusiast yet, but I was digging into the subject when I discovered the .... open source summariser. I wanted to do a quick deploy with it onto a service and start experimenting with text summarisation. For the service I have chosen Fly.io which deploys apps closer to the user so that it responds much faster.

Let's look at the summariser we'll be deploying....

etc....

That should smooth the flow at the start.

geshan commented 4 years ago

Hello @codepope ,

Thanks for pointing me to a good opener. Let's discuss the sections first this time as it will be easier to do it section by section. This is what is in my mind:


Bert Extractive Summarizer on Fly.io

--> the opener goes here.

Running the summarizer API on Fly.io

--> One line here

Prerequisites

--> 2 steps similar to puppeteer-js-renderer

Steps

--> Similar to puppeteer-js-renderer. I will remove the images, recheck it and main difference here will be the scaling to cpu2mem2 vm and the curl example. If you have a suggestion for the curl example I am open to it. Like a sample text from a wilki page or something of that sort.

Endless possibilites

--> I am not sure of this section too, so let me know your views.


If I need to add any new section please let me know of that too. I would like to be more structured this time :) as we have already worked on the puppeteer-js-renderer already. Hope to hear from you soon, thanks!

geshan commented 4 years ago

May be a summary of the first 3 paragraphs of - https://en.wikipedia.org/wiki/Wiki - one of the most viewed pages on wikipedia.

codepope commented 4 years ago

Looks reasonable as an outline.

I'd switch "Endless Possibilities" for "Possible Applications" and loosely suggest some ideas for things that would benefit from summarizing (news feeds, instructions, blog articles...) ...

Possible bonus section, leverage puppeteer-js and get it to extract some text from a URL and feed it to the summarizer. And refer people to the guide for that too.

I'd avoid summarizing Wikipedia articles as encylopedias tend to be short statements of fact which are either overly easy to summarize or terribly complex. For the example paragraphs, how about summarizing a recent blog post at Fly like the Sandbox and Isolation one (or at least see how it comes out).

mrkurt commented 4 years ago

@geshan We are migrating content discussions to our new community site. Will you please copy your topic for this over to the new forum: https://community.fly.io/c/write/writers-room/8

Just copy and paste on the original issue is fine, with a link back here.