Boundary smoothing 时候，两个entity正好在旁边怎么办

terenceau2 commented 4 months ago

你好，我想请教一个理论性的问题

我假如有一个entity，class是entity A的，他的位置是句子中的第3个字，span position就是（3，3）。旁边又有另外一个entity，是entity type B，位置是（4，4）

然后我现在做boundary smoothing（of distance 2， epsilon=0.2），那entity A的probability就是1-0.2=0.8，旁边的，譬如（4，4）就会被分到一些些，epsilon/num_of_surrounding_spans 这样就撞了（4，4）的entity B，这种情况会如何处理？（同理对于这个entity b，做smoothing的时候他也会撞到（3，3）的entity A

terenceau2 commented 4 months ago

以上这种情况，就是（4，4）会有0.8 的entity B，再加上epsilon/num_of_surrounding_spans 的entity A。是这样吗？这是我根据这段code的理解

                for label, start, end in self.chunks:
                    label_id = config.label2idx[label]
                    self.label_ids[start, end-1, label_id] += (1 - config.sb_epsilon)

                    for dist in range(1, config.sb_size+1):
                        eps_per_span = config.sb_epsilon / (config.sb_size * dist * 4)
                        sur_spans = list(_spans_from_surrounding((start, end), dist, self.num_tokens))
                        for sur_start, sur_end in sur_spans:
                            self.label_ids[sur_start, sur_end-1, label_id] += (eps_per_span*config.sb_adj_factor)
                        # Absorb the probabilities assigned to illegal positions
                        self.label_ids[start, end-1, label_id] += eps_per_span * (dist * 4 - len(sur_spans))

因为我没有太理解您paper里的这一句： After such entity probability re-allocation, any remaining probability of a span is assigned to be “non-entity”

syuoni commented 4 months ago

你好，

按照公式，Span (3, 3)

属于 Entity A 的概率为 1 - eps = 1 - 0.2 = 0.8
属于 Entity B 的概率为 eps / (sb_size dist 4) = 0.2 / (2 2 4) = 0.0125
属于 Non-entity 的概率为 1 - 0.8 - 0.0125 = 0.1875

类似地，Span (4, 4)

属于 Entity B 的概率为 0.8
属于 Entity A 的概率为 0.0125
属于 Non-entity 的概率为 0.1875

可以看到，ground-truth 是一个在所有实体类别上的概率分布，使用 soft-label cross entropy 将预测概率分布拟合到这个 ground-truth 概率分布即可。

houyuchao commented 1 month ago

eps_per_span

您好我想问这段代码在哪个文件里啊，我怎么没找到

syuoni commented 1 month ago

在这里哈：https://github.com/syuoni/eznlp/blob/master/eznlp/model/decoder/boundaries.py#L173-L179

houyuchao commented 1 month ago

非常感谢

---原始邮件--- 发件人: "Enwei @.> 发送时间: 2024年7月1日(周一) 下午5:19 收件人: @.>; 抄送: @.**@.>; 主题: Re: [syuoni/eznlp] Boundary smoothing 时候，两个entity正好在旁边怎么办 (Issue #50)

在这里哈：https://github.com/syuoni/eznlp/blob/master/eznlp/model/decoder/boundaries.py#L173-L179

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

houyuchao commented 1 month ago

以上这种情况，就是（4，4）会有0.8 的entity B，再加上epsilon/num_of_surrounding_spans 的entity A。是这样吗？这是我根据这段code的理解

                for label, start, end in self.chunks:
                    label_id = config.label2idx[label]
                    self.label_ids[start, end-1, label_id] += (1 - config.sb_epsilon)

                    for dist in range(1, config.sb_size+1):
                        eps_per_span = config.sb_epsilon / (config.sb_size * dist * 4)
                        sur_spans = list(_spans_from_surrounding((start, end), dist, self.num_tokens))
                        for sur_start, sur_end in sur_spans:
                            self.label_ids[sur_start, sur_end-1, label_id] += (eps_per_span*config.sb_adj_factor)
                        # Absorb the probabilities assigned to illegal positions
                        self.label_ids[start, end-1, label_id] += eps_per_span * (dist * 4 - len(sur_spans))

因为我没有太理解您paper里的这一句： After such entity probability re-allocation, any remaining probability of a span is assigned to be “non-entity”

您好您复现完了这篇论文了吗？我有点搞不明白在主程序entity_recognition中是如何调用boundary smoothing方法的

dpj135 commented 1 month ago

你好，

按照公式，Span (3, 3)

属于 Entity A 的概率为 1 - eps = 1 - 0.2 = 0.8

属于 Entity B 的概率为 eps / (sb_size dist 4) = 0.2 / (2 2 4) = 0.0125

属于 Non-entity 的概率为 1 - 0.8 - 0.0125 = 0.1875

类似地，Span (4, 4)

属于 Entity B 的概率为 0.8

属于 Entity A 的概率为 0.0125

属于 Non-entity 的概率为 0.1875

可以看到，ground-truth 是一个在所有实体类别上的概率分布，使用 soft-label cross entropy 将预测概率分布拟合到这个 ground-truth 概率分布即可。

您好，如果是这样计算的话，周围非实体的span其概率和不就超过1了吗？是也要将非实体span的“non-entity”类概率对应减小吗

syuoni commented 1 month ago

以上这种情况，就是（4，4）会有0.8 的entity B，再加上epsilon/num_of_surrounding_spans 的entity A。是这样吗？这是我根据这段code的理解
                for label, start, end in self.chunks:
                    label_id = config.label2idx[label]
                    self.label_ids[start, end-1, label_id] += (1 - config.sb_epsilon)

                    for dist in range(1, config.sb_size+1):
                        eps_per_span = config.sb_epsilon / (config.sb_size * dist * 4)
                        sur_spans = list(_spans_from_surrounding((start, end), dist, self.num_tokens))
                        for sur_start, sur_end in sur_spans:
                            self.label_ids[sur_start, sur_end-1, label_id] += (eps_per_span*config.sb_adj_factor)
                        # Absorb the probabilities assigned to illegal positions
                        self.label_ids[start, end-1, label_id] += eps_per_span * (dist * 4 - len(sur_spans))
因为我没有太理解您paper里的这一句： After such entity probability re-allocation, any remaining probability of a span is assigned to be “non-entity”
您好您复现完了这篇论文了吗？我有点搞不明白在主程序entity_recognition中是如何调用boundary smoothing方法的

entity_recognition.py 里调用 boundary_smoothing 在这里：https://github.com/syuoni/eznlp/blob/master/scripts/entity_recognition.py#L244-L259

syuoni commented 1 month ago

你好，按照公式，Span (3, 3)

属于 Entity A 的概率为 1 - eps = 1 - 0.2 = 0.8

属于 Entity B 的概率为 eps / (sb_size dist 4) = 0.2 / (2 2 4) = 0.0125

属于 Non-entity 的概率为 1 - 0.8 - 0.0125 = 0.1875

类似地，Span (4, 4)

属于 Entity B 的概率为 0.8

属于 Entity A 的概率为 0.0125

属于 Non-entity 的概率为 0.1875

可以看到，ground-truth 是一个在所有实体类别上的概率分布，使用 soft-label cross entropy 将预测概率分布拟合到这个 ground-truth 概率分布即可。

您好，如果是这样计算的话，周围非实体的span其概率和不就超过1了吗？是也要将非实体span的“non-entity”类概率对应减小吗

是的

syuoni / eznlp

Boundary smoothing 时候，两个entity正好在旁边怎么办 #50