Open dennlinger opened 2 years ago
Hi Dennis,
Thank you for interest in our research work! I will add some information considering the JSON format such that its schema will become clearer. For data access you can directly use the load_verdict
function which loads a list of sentences for each of the different judgment sections. We directly worked with the JSON files in our research.
Considering your questions:
I hope this answers your questions. If you have any further questions, please do not hesitate to ask.
Best regards, Sebastian
Hi Sebastian, thanks for your detailed response! I have two minor follow-up questions, but I think for the largest part my questions about the processing have been fully answered.
One clarifying question on the guiding principle: Did you investigate what the distribution of available official/third party writings is? It would be interesting to see whether those differ in their textual structure or other aspects that might be important for the summary generation, but I'm not familiar enough with the legal context to judge the differences here.
Otherwise, you mention that you used different aggregation methods (separate encodings vs concatenation) for the "facts & reasoning" texts. I assume you did not find any meaningful differences? This is particularly interesting because many other domains present in summarization datasets exhibit strong positional biases (e.g., news articles), which gets obfuscated for concatenated texts.
Thanks again! Best, Dennis
Hi Dennis, No, we did not further investigate the distributions of official/third party summaries. I would assume that there is roughly a 50/50 split between them. I have to note though that the level of detail might be different for the different author types. Some official summaries tend to be rather short and more high level, but it is hard to quantify this observation and generalize it to the entire dataset. But as I said, we did not look too much into this.
We primarily used the separate encoding strategy for abstractive summarization, to reduce the amount of information a model needs to keep track of in one embedding. For extractive summarization we only used the concatenation, so we did not directly compare both. One thing I want to note here: The language of the facts and the reasoning is different and there are often certain language cues which indicate the beginning of the reasoning part. As a consequence a model should be able to differentiate them.
Considering biases: There does not seem to be an obvious bias towards selecting sentences from the beginning of either text part, as seen in the comparison between the random sentence selection with the lead baselines. And we could also not find any clear indicator for positional bias when investigating the used labels for extractive summarization. But this can also be due to the level of abstractiveness/novelty of the summaries.
Best regards, Sebastian
Perfect, thanks a lot again for the detailed explanations! I'll close this for now, since all of my points have been addressed.
Hi Sebastian, sorry to reopen the issue again, but when going through the samples, I noticed an inconsistency with the above statement:
- There are two types of guiding principles: from the official judgment or from third parties. In this case we did not differentiate between them as generation targets. Each verdict only has one of both.
I was assuming that this refers to first, respectively second, element in the "guiding_principle"
segment of the JSON documents. When looking at samples, however, I noticed that some documents have content in both segments, which seems confusing to me. Again, load_verdict
is not helping here, since it naively concatenates both fields, irrespective of the actual content.
As an example for the encountered points, I looked at the file Y-300-Z-BECKRS-B-2016-N-67530.json
.
EDIT: FWIW, I have just checked, having both at the same time affects about 1200-1300 samples total (~1.25%).
Also, some further questions:
Best, Dennis
Hi Dennis,
Sorry I misremembered. The splitting of the guiding principle was based on whether they were annotated as "redaktionell" or "amtlich" and then assigned accordingly. So "redaktionell" would be the third-party statements in this case. As both were always coming from the same judgment, we concatenated them when using them as a target summary. Considering the source of the "redaktionell"/third-party summaries: I cannot make a statement for all of them, but many are given by a legal publisher.
Considering the Tenor: It can be seen as the summary of the legal consequences. But this segment is not so interesting to study for summarization as it can be generated by a template in most cases. So, who is in the right, who has to pay, etc.
Additionally, I added some information about the JSON files.
Best regards, Sebastian
Hi, first of all, very grateful to see more German-centric datasets, especially for the legal domain! I'm currently trying to have a look at the dataset, however, I'm struggling to follow some of the implicit assumptions about the data, and how to interpret the
.json
files that are in the downloadable Dropbox file.To be precise, the main questions are the following:
load_verdict()
, it is indicated that there are (at most) two paragraphs in the guiding principle. Is there any explanation for this particular processing step that I'm missing?./data/filter_files
implies that there are additional filters steps in place. Are these filters already applied to the data available in the Dropbox download, or would I have to apply those filters myself?./model/
contain less than the 100k samples mentioned in the paper. In fact, they contain 79937 (train), 9992 (validation) and 9993 references, which sums to 99922 files. The downloaded Dropbox file contains 100018 samples, so a discrepancy of some hundred files. I haven't (yet) checked which ones exactly are missing, but it would be great to maybe hear your point on this.facts
&reasoning
(again, based on the values returned inload_verdict
)?Many thanks in advance to take the time to answer these questions! Best, Dennis