ohmeow / ohmeow_website

Apache License 2.0
25 stars 21 forks source link

Summarization with blurr | ohmeow #18

Closed utterances-bot closed 3 years ago

utterances-bot commented 3 years ago

Summarization with blurr | ohmeow

blurr is a libray I started that integrates huggingface transformers with the world of fastai v2, giving fastai devs everything they need to train, evaluate, and deploy transformer specific models. In this article, I provide a simple example of how to use blurr’s new summarization capabilities to train, evaluate, and deploy a BART summarization model.

https://ohmeow.com/posts/2020/05/23/text-generation-with-blurr.html

Ghani-25 commented 3 years ago

Hello,

I have an error in this line

learn.fit_one_cycle(1, lr_max=3e-5, cbs=fit_cbs)

the error i got is :

/usr/local/lib/python3.7/dist-packages/blurr/modeling/seq2seq/core.py in after_validate(self) 138 for score_key, score in res.items(): 139 if (f'{metricname}{score_key}' not in self.custom_metric_vals): continue --> 140 self.custom_metric_vals[f'{metricname}{score_key}'] = score.mean().item() 141 elif (is_listy(return_val)):

AttributeError: 'list' object has no attribute 'mean'

Can you help me please?

ohmeow commented 3 years ago

This will be fixed in the next version of blurr (just waiting for fastai 2.3.1 to be made available). Stay tuned :)

ohmeow commented 3 years ago

Fixed. Check it out.

iriswang1 commented 2 years ago

Hi, I kept getting this error from "learn.fit_one_cycle(1, lr_max=3e-5, cbs=fit_cbs)", can you take a look at this? Thanks! TypeError: get_hash() missing 1 required positional argument: 'use_fast_tokenizer'

The whole error message is as below: TypeError Traceback (most recent call last)

in () ----> 1 learn.fit_one_cycle(3, lr_max=3e-4, cbs=fit_cbs) 16 frames /usr/local/lib/python3.7/dist-packages/fastai/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt) 111 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final), 112 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))} --> 113 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd) 114 115 # Cell /usr/local/lib/python3.7/dist-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt) 219 self.opt.set_hypers(lr=self.lr if lr is None else lr) 220 self.n_epoch = n_epoch --> 221 self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup) 222 223 def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final) 161 162 def _with_events(self, f, event_type, ex, final=noop): --> 163 try: self(f'before_{event_type}'); f() 164 except ex: self(f'after_cancel_{event_type}') 165 self(f'after_{event_type}'); final() /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _do_fit(self) 210 for epoch in range(self.n_epoch): 211 self.epoch=epoch --> 212 self._with_events(self._do_epoch, 'epoch', CancelEpochException) 213 214 def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False): /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final) 161 162 def _with_events(self, f, event_type, ex, final=noop): --> 163 try: self(f'before_{event_type}'); f() 164 except ex: self(f'after_cancel_{event_type}') 165 self(f'after_{event_type}'); final() /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _do_epoch(self) 205 def _do_epoch(self): 206 self._do_epoch_train() --> 207 self._do_epoch_validate() 208 209 def _do_fit(self): /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _do_epoch_validate(self, ds_idx, dl) 201 if dl is None: dl = self.dls[ds_idx] 202 self.dl = dl --> 203 with torch.no_grad(): self._with_events(self.all_batches, 'validate', CancelValidException) 204 205 def _do_epoch(self): /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final) 163 try: self(f'before_{event_type}'); f() 164 except ex: self(f'after_cancel_{event_type}') --> 165 self(f'after_{event_type}'); final() 166 167 def all_batches(self): /usr/local/lib/python3.7/dist-packages/fastai/learner.py in __call__(self, event_name) 139 140 def ordered_cbs(self, event): return [cb for cb in self.cbs.sorted('order') if hasattr(cb, event)] --> 141 def __call__(self, event_name): L(event_name).map(self._call_one) 142 143 def _call_one(self, event_name): /usr/local/lib/python3.7/dist-packages/fastcore/foundation.py in map(self, f, gen, *args, **kwargs) 152 def range(cls, a, b=None, step=None): return cls(range_of(a, b=b, step=step)) 153 --> 154 def map(self, f, *args, gen=False, **kwargs): return self._new(map_ex(self, f, *args, gen=gen, **kwargs)) 155 def argwhere(self, f, negate=False, **kwargs): return self._new(argwhere(self, f, negate, **kwargs)) 156 def argfirst(self, f, negate=False): return first(i for i,o in self.enumerate() if f(o)) /usr/local/lib/python3.7/dist-packages/fastcore/basics.py in map_ex(iterable, f, gen, *args, **kwargs) 664 res = map(g, iterable) 665 if gen: return res --> 666 return list(res) 667 668 # Cell /usr/local/lib/python3.7/dist-packages/fastcore/basics.py in __call__(self, *args, **kwargs) 649 if isinstance(v,_Arg): kwargs[k] = args.pop(v.i) 650 fargs = [args[x.i] if isinstance(x, _Arg) else x for x in self.pargs] + args[self.maxi+1:] --> 651 return self.func(*fargs, **kwargs) 652 653 # Cell /usr/local/lib/python3.7/dist-packages/fastai/learner.py in _call_one(self, event_name) 143 def _call_one(self, event_name): 144 if not hasattr(event, event_name): raise Exception(f'missing {event_name}') --> 145 for cb in self.cbs.sorted('order'): cb(event_name) 146 147 def _bn_bias_state(self, with_bias): return norm_bias_params(self.model, with_bias).map(self.opt.state) /usr/local/lib/python3.7/dist-packages/fastai/callback/core.py in __call__(self, event_name) 43 (self.run_valid and not getattr(self, 'training', False))) 44 res = None ---> 45 if self.run and _run: res = getattr(self, event_name, noop)() 46 if event_name=='after_fit': self.run=True #Reset self.run to True at each end of fit 47 return res /usr/local/lib/python3.7/dist-packages/blurr/modeling/seq2seq/core.py in after_validate(self) 167 168 # calls the metrics "compute" function --> 169 res = compute_func(predictions=predictions, references=references) 170 171 # updates the custom_metric_vals with the metric's value /usr/local/lib/python3.7/dist-packages/datasets/metric.py in compute(self, predictions, references, **kwargs) 400 references = self.data["references"] 401 with temp_seed(self.seed): --> 402 output = self._compute(predictions=predictions, references=references, **kwargs) 403 404 if self.buf_writer is not None: /root/.cache/huggingface/modules/datasets_modules/metrics/bertscore/acd7f806e3c6996af65006355eeb46c0a8a6ac0009344c2f3224f66d483cf70a/bertscore.py in _compute(self, predictions, references, lang, model_type, num_layers, verbose, idf, device, batch_size, nthreads, all_layers, rescale_with_baseline, baseline_path) 125 idf=idf, 126 rescale_with_baseline=rescale_with_baseline, --> 127 use_custom_baseline=baseline_path is not None, 128 ) 129
ohmeow commented 2 years ago

I just ran it on colab without any problems. You may want to change the pip installs to ensure you’re using the latest versions on hugging face,etc. via pip install transformers-Uqq.

waisyousofi commented 2 years ago

Thanks, it is helping me a lot with labeled data. But can you tell how to fine tune BART with unlabeled data?

kanianna commented 2 years ago

ImportError: cannot import name 'PreCalculatedLoss' from 'blurr.utils' (conda/envs/BTSUM/lib/python3.7/site-packages/blurr/utils.py)

Can you help me on this. If i try to fix the above error through local installation of ohmeow-blurr. I m geeting an another error as

TypeError Traceback (most recent call last) /tmp/ipykernel_35280/1278113591.py in ----> 1 learn.fit_one_cycle(3, lr_max=3e-5, cbs=fit_cbs)

and when i try to reinstall -uqq with the following comment : pip install transformers-Uqq. I got the below error. ERROR: Could not find a version that satisfies the requirement transformers-Uqq (from versions: none) ERROR: No matching distribution found for transformers-Uqq

I'm not running my code on collab. I'm running it on a GPU machine.

ohmeow commented 2 years ago

Looks like you're using an old version of the library.

See the available loss functions here

datha29 commented 1 year ago

This is Datha here .First of all I would like to thank you for the awesome library developed by you.Its really helpful & wonderful Just wanted a small help.I am trying to fine tune my model using the below code as base for my own data.But when I am trying to run with GPU(Cuda)  the kernel is dying & getting restarted.Would love to know  what is the solution for this.In colab the code runs faster but slow in Vertex workbench

ohmeow commented 1 year ago

Looks like you forgot to copy/paste your code. Please share a gist and I'll find some time to take a look to see if I notice any issues.

datha29 commented 1 year ago

Error (1).docx Hi Have attached code & error in doc FYR.The code runs well in colab but giving error when I run in vertex workbench

ohmeow commented 1 year ago

I don't feel comfortable opening word docs from outside sources ... can you please put this into a github gist/notebook?

Thanks much - wg

On Mon, Jan 23, 2023 at 10:56 PM Datha_Kamath @.***> wrote:

Error (1).docx https://github.com/ohmeow/ohmeow_website/files/10487032/Error.1.docx Hi Have attached code & error in doc FYR.The code runs well in colab but giving error when I run in vertex workbench

— Reply to this email directly, view it on GitHub https://github.com/ohmeow/ohmeow_website/issues/18#issuecomment-1401468229, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADNMHTNEIX45UC7DPOF4DWT54IDANCNFSM4YITF7XQ . You are receiving this because you modified the open/close state.Message ID: @.***>

AnasDS1 commented 1 year ago

Hi I am facing memory error from Google Colab, Is there any work around? It's not allowing me to train the model even for 5 articles.

bagaspranawa commented 10 months ago

Hi. I am trying to summarize many sentences using blurr_summarize. But I found that this is not done with the GPU. So the time required is very long. Is there another way to shorten this time? For the record, I have set num_return_sequences to 1.