Open utterances-bot opened 1 year ago
Hi Thank you for the great Article, i'm having a problem running assemble command using your provided ruler.cfg file, the error i'm getting is as follow
✘ Error parsing config section. Perhaps a section name is wrong?
initialize -> components -> span_ruler Section 'components' is not defined
{'nlp': {'pipeline': ['tok2vec', 'ner', 'span_ruler']}, 'components': {'ner': {'source': '/content/drive/MyDrive/output_spacy/model-best'}, 'span_ruler': {'factory': 'span_ruler', 'spans_key': None, 'annotate_ents': True, 'ents_filter': {'@misc': 'spacy.prioritize_new_ents_filter.v1'}, 'validate': True, 'overwrite': False}, 'tok2vec': {'source': '/content/drive/MyDrive/output_spacy/model-best'}}, 'initialize': {}}
can you please help
Hi sorry about that, I wasn't able to mention that the ruler.cfg
is just an excerpt. Will update in a few. I suggest looking at the example project instead (this is from a forked PR, we'll merge this very soon to the main projects
repository) instead to see the full config.
Hi Thanks for clarifying, Much appreciated :)
Hi :)
Many thanks for this post as it clarified the use of span_ruler a bit closer. I have, however, some issues with understanding the pipeline architecture when using a span_ruler and spancat.
I have used simple TEXT/lower patterns that match whole sentences and used sentencizer as an annotating component and as a component in the pipeline (["sentencizer","tok2vec","spancat"], in this order
). This worked even though I had no [components.span_ruler] in my training config.
I now used a pattern similar to the one you posted, with an additional ENT_TYPE pattern, and the training returns 0.00 scores on all scoring metrics. Do I need to pass any component to annotating_components = []
?
Currently, my pipeline components are: ["tok2vec", "spancat", "span_ruler"]
and the span_ruler and spancat components are:
[components.span_ruler]
factory = "span_ruler"
spans_key = "ruler"
validate = true
overwrite = false
[components.spancat]
factory = "spancat"
max_positive = null
scorer = {"@scorers":"spacy.spancat_scorer.v1"}
spans_key = "ruler"
threshold = 0.5
Since data debug finds no issues with my training data, I assume the issue must be with either 1) the order of my components in which they are initialized or 2) the parameters in the config itself.
Thanks a lot for any help and apologies for reaching out here instead of on Github.
To add to that, my config.cfg in the trained (with 0.00 scorer, so not really) model looks like this:
[components.span_ruler]
factory = "span_ruler"
annotate_ents = false
ents_filter = {"@misc":"spacy.first_longest_spans_filter.v1"}
matcher_fuzzy_compare = {"@misc":"spacy.levenshtein_compare.v1"}
overwrite = false
phrase_matcher_attr = null
spans_filter = null
spans_key = "ruler"
validate = true
[components.span_ruler.scorer]
@scorers = "spacy.overlapping_labeled_spans_scorer.v1"
spans_key = "ruler"
[components.spancat]
factory = "spancat"
max_positive = null
scorer = {"@scorers":"spacy.spancat_scorer.v1"}
spans_key = "ruler"
threshold = 0.5
spaCy Internals: Rules-based rules!
spaCy has a comprehensive way to define rules for matching tokens, phrases, entities (and more!) to enhance statistical models. In this blog post, I'll share...
https://ljvmiranda921.github.io/notebook/2022/12/25/rules-based-rules/