ioBroker / ioBroker.parser

Parse web-site or file and extract data from it.
MIT License
23 stars 12 forks source link

Match of specific item (of multiple matches) fails #63

Open th4git opened 2 years ago

th4git commented 2 years ago

General assumption The matched portions of the input text should be selectable via item-counter to adress each matched portion by its match count. See https://regex101.com/r/1nRZmi/1 for an example with regex This (domain) is for use in (\w+) examples with input from the website www.example.com

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>

which leads to the matches

Match 1 | This domain is for use in illustrative examples
Group 1 | domain
Group 2 | illustrative

Describe the bug
The parser-adapter only allows to select the first matched group from example above via item == 0. Additional group matches (adressing with item >= 1) deliever an empty match. The complete match is not adressable, too.

By the way: Even negative values for item-number can by entered, which makes no sense here and should be prevented.

To Reproduce
image

image

Expected behavior
All matches of a regexp should be addressable via item-counter. If the behaviour would be aligned with that of website regex101.com, there should be three matches at the example from above:

Item 0 | Match 1 | This domain is for use in illustrative examples
Item 1 | Group 1 | domain
Item 2 | Group 2 | illustrative

But at least all group matches should be addressable (not only the first one) .

Maybe there is an overlap with bug #42 and a relation to feature request #40 (@Apollon77).

Versions:

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs within the next 7 days. Please check if the issue is still relevant in the most current version of the adapter and tell us. Also check that all relevant details, logs and reproduction steps are included and update them if needed. Thank you for your contributions. Dieses Problem wurde automatisch als veraltet markiert, da es in letzter Zeit keine Aktivitäten gab. Es wird geschlossen, wenn nicht innerhalb der nächsten 7 Tage weitere Aktivitäten stattfinden. Bitte überprüft, ob das Problem auch in der aktuellsten Version des Adapters noch relevant ist, und teilt uns dies mit. Überprüft auch, ob alle relevanten Details, Logs und Reproduktionsschritte enthalten sind bzw. aktualisiert diese. Vielen Dank für Eure Unterstützung.

GermanBluefox commented 1 year ago

Why just not use This domain is for use in (\w+) examples as regex if you want to get second group anyway?