megagonlabs / coop

☘️ Code for Convex Aggregation for Opinion Summarization (Iso et al; Findings of EMNLP 2021)
https://aclanthology.org/2021.findings-emnlp.328v2.pdf
BSD 3-Clause "New" or "Revised" License
34 stars 9 forks source link

Training COOP on Space #3

Closed tomhosking closed 1 year ago

tomhosking commented 1 year ago

Hi,

I'd like to train COOP on other datasets, e.g. SPACE - what format does the dataset need to be in?

Thanks!

isomap commented 1 year ago

Hi Tom,

Sorry for the late response.

You can find the format for training the VAE model here: https://github.com/megagonlabs/coop/blob/main/scripts/preprocess.py#L31-L34 and also refer to the following script to check the format for dev/test sets. https://github.com/megagonlabs/coop/blob/main/scripts/get_summ.py#L40-L45

Thanks!

tomhosking commented 1 year ago

Thanks for this - I'll try to submit a PR with the preprocessing and configs needed to train on SPACE.

I did notice that Yelp and Amazon contain only 8 input reviews for evaluation, whereas SPACE can contain hundreds - this leads to a OOM problem when calculating all possible combinations of inputs at inference time. Is there a preferred way to "prune" the inputs, or is it OK to just limit the input to a random selection of 8 reviews?

isomap commented 1 year ago

Awesome! Thanks a lot!

OOM problem

Yes, 2^100 combinations must be too big to explore, so I have two suggestions to apply COOP for the larger scale of input reviews:

  1. Using approximate search (e.g., beam search) to reduce the search space as discussed in S6.2 (https://aclanthology.org/2021.findings-emnlp.328v2.pdf#page=7)
  2. Finding small sets of representative reviews (e.g., 8 out of 100) in some way and directly applying the COOP algorithm.

Ideally, approach 1 would be better based on the findings in the paper, but implementation-wise, approach 2 could be much easier, I think. (Sorry, I think I cannot share the beam search code used in the paper, and it might not be optimized for such a situation.)

Although I cannot provide the exact numbers and checkpoints now, the BiMeanVAE model without Coop (i.e., SimpleAvg) shows nice summarization performance wrt ROUGE scores on the SPACE corpus when I internally tested. So we may not need to apply Coop for the Space corpus to get a reasonable opinion summarizer, but I didn't test the performance with Coop in this setting.