temoto / robotstxt

The robots.txt exclusion protocol implementation for Go language
MIT License
269 stars 55 forks source link

Initialize field 'Agent' of struct Group #34

Closed JVMerkle closed 2 years ago

JVMerkle commented 2 years ago

This field was unused throughout the project and was never initialized at instantiation during parsing.

temoto commented 2 years ago

@JVMerkle could you show robots.txt file that is parsed incorrectly without this patch? I'd like to save it in test case.

JVMerkle commented 2 years ago

Parsing is fine, but when I used FindGroup I expected I could actually use the Agent field of the returned Group to determine which Agent my group query was matched against. This is not the case right now, because Agent is simply unused throughout the project. I assume my patch fixes, what was originally intended with the agent field, correct me if I am wrong.

temoto commented 2 years ago

Just to make it clear.

group := robots.FindGroup("webproxy")
group.Test("/")
// you also want to read "webproxy" via `group` here?

Is it related to user-agent wildcards?

JVMerkle commented 2 years ago

I'd expect that:

User-Agent: *
Disallow: /
User-Agent: some
Allow: /
g := robots.FindGroup("foo")
ASSERT(g.Agent, "*")

g = robots.FindGroup("some")
ASSERT(g.Agent, "some")

Is that expectation wrong?

temoto commented 2 years ago

It's not wrong, just new use case. Basically you want to know wildcard that matched your agent. Can't imagine what it's good for, but yeah, should work and it doesn't.

JVMerkle commented 2 years ago

Is is okay as it is?

temoto commented 2 years ago

Thanks for test. I've removed comment commit and rebased it on Github CI.

JVMerkle commented 2 years ago

Thanks 👍