Closed thewizardplusplus closed 4 months ago
You can't just make a variable public, because that allows accidental or malicious mutation outside of package.
Current unit tests check that some agent is allowed on some path. I'm not sure that extracting (dis)allowall is beneficial. You would repeat part of group.test function in your code to check against new value. How is that better?
Other arguments seem to want marshalling to save result of parsing and reuse it later. But what's wrong with just reparsing robots file again? It's pretty fast.
On Sun, Jul 17, 2022, 02:29 thewizardplusplus @.***> wrote:
It may be useful in the following cases:
- Unit tests. Let's assume we are testing a function that filters a list of links with a previously parsed robots.txt file. How do we check the case when a robots.txt file disallows all links? Now it is possible only with actual parsing of the corresponding robots.txt file.
- Errors not related to the network. Let's assume we are collecting a cache of parsed robots.txt files. What should we put in it in case of an error? Yes, the library has the useful functions analyze a status of a response, but not all errors are due to network problems (for example, let the cache accepts links as strings and parses them to construct a link to robots.txt files; this parsing may result in an error).
- Support for temporary ignoring robots.txt files. Let's assume we want to allow a user to ignore robots.txt files through a special option. That is, allow all links. Now again, it is possible only with actual parsing of the corresponding robots.txt file.
I suppose it would be more comfortable to use the prepared public instances for allowing and disallowing all links.
You can view, comment on, or merge this pull request online at:
https://github.com/temoto/robotstxt/pull/36 Commit Summary
- 1727854 https://github.com/temoto/robotstxt/pull/36/commits/1727854d05f881adede433ac451aca9b15e42de5 Make the
AllowAll
instance public- 60a6ab8 https://github.com/temoto/robotstxt/pull/36/commits/60a6ab825cf6041f932be722eded848656aa7a7b Make the
DisallowAll
instance publicFile Changes
(1 file https://github.com/temoto/robotstxt/pull/36/files)
- M robotstxt.go https://github.com/temoto/robotstxt/pull/36/files#diff-67807f83711144695da19186cfc6d8f8df05669cb2640d5d926a2bf65aa0f50e (12)
Patch Links:
— Reply to this email directly, view it on GitHub https://github.com/temoto/robotstxt/pull/36, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAGTMNSCNET5XHUIQWBAR3VUNAU7ANCNFSM53YZ3ISQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
You can't just make a variable public, because that allows accidental or malicious mutation outside of package.
This is common practice in Go, recall io.EOF
or http.DefaultClient
. The latter is the full equivalent of our case.
But you are right. I can suggest using functions that return these constants, such as NewAllowAll()
and NewDisallowAll()
(the names are up to you).
Current unit tests check that some agent is allowed on some path. I'm not sure that extracting (dis)allowall is beneficial. You would repeat part of group.test function in your code to check against new value. How is that better?
Other arguments seem to want marshalling to save result of parsing and reuse it later. But what's wrong with just reparsing robots file again? It's pretty fast.
You are right, I can parse prepared strings every time I need those special cases. But I find it non-ideomatic: I need the special value, the library has that value, but private, and I have to parse the same string over and over again.
After all, why does the library have these constants? You could also parse prepared strings each time to get AllowAll
and DisallowAll
. That's fast. But you chose to use constants. Because it's obviously more comfortable (and faster).
After all, why does the library have these constants?
To save memory.
Library can't save cpu/time because it needs to prove that incoming text is equivalent to popular constant.
These constants do not have special meaning. They're just cache of common inputs.
On Mon, Jul 18, 2022, 12:31 thewizardplusplus @.***> wrote:
You can't just make a variable public, because that allows accidental or malicious mutation outside of package.
This is common practice in Go, recall io.EOF or http.DefaultClient. The latter is the full equivalent of our case.
But you are right. I can suggest using functions that return these constants, such as NewAllowAll() and NewDisallowAll() (the names are up to you).
Current unit tests check that some agent is allowed on some path. I'm not sure that extracting (dis)allowall is beneficial. You would repeat part of group.test function in your code to check against new value. How is that better?
Other arguments seem to want marshalling to save result of parsing and reuse it later. But what's wrong with just reparsing robots file again? It's pretty fast.
You are right, I can parse prepared strings every time I need those special cases. But I find it non-ideomatic: I need the special value, the library has that value, but private, and I have to parse the same string over and over again.
After all, why does the library have these constants? You could also parse prepared strings each time to get AllowAll and DisallowAll. That's fast. But you chose to use constants. Because it's obviously more comfortable (and faster).
— Reply to this email directly, view it on GitHub https://github.com/temoto/robotstxt/pull/36#issuecomment-1186976180, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAGTMMCS3AJ27Y3FI4CV2LVUUP6ZANCNFSM53YZ3ISQ . You are receiving this because you commented.Message ID: @.***>
I can parse prepared strings every time I need those special cases. But I find it non-ideomatic: I need the special value
This is where we don't agree yet.
Library doesn't use (dis)allowAll constants as special value. Maybe RobotsData.(Dis)AllowAll bool
should be exported. Maybe via method RobotsData.SetAll(bool)
.
Let's imagine RobotsData
supports Marshaler interface and result is stable between Go versions, like protobuf. Will that be ok or would you still want SetAll
?
Merging #36 (60a6ab8) into master (a68aeca) will not change coverage. The diff coverage is
100.00%
.
@@ Coverage Diff @@
## master #36 +/- ##
=======================================
Coverage 92.59% 92.59%
=======================================
Files 3 3
Lines 405 405
=======================================
Hits 375 375
Misses 19 19
Partials 11 11
Impacted Files | Coverage Δ | |
---|---|---|
robotstxt.go | 92.85% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update a68aeca...60a6ab8. Read the comment docs.
It may be useful in the following cases:
robots.txt
file. How do we check the case when arobots.txt
file disallows all links? Now it is possible only with actual parsing of the correspondingrobots.txt
file.robots.txt
files. What should we put in it in case of an error? Yes, the library has the useful functions analyze a status of a response, but not all errors are due to network problems (for example, let the cache accepts links as strings and parses them to construct a link torobots.txt
files; this parsing may result in an error).robots.txt
files. Let's assume we want to allow a user to ignorerobots.txt
files through a special option. That is, allow all links. Now again, it is possible only with actual parsing of the correspondingrobots.txt
file.I suppose it would be more comfortable to use the prepared public instances for allowing and disallowing all links.