Closed yachang closed 7 years ago
Reviewed 2 of 4 files at r1, 1 of 1 files at r2. Review status: 3 of 5 files reviewed at latest revision, 8 unresolved discussions.
unembargo.go, line 44 at r3 (raw file):
import ( "fmt"
Please use log
for reporting errors (or Greg found glog
recently https://github.com/golang/glog which looks even better).
unembargo.go, line 61 at r3 (raw file):
current_time := time.Now().UTC().Format("2006-01-02") current_year, _ := strconv.Atoi(current_time[0:4]) cutoff_date, _ := strconv.Atoi(strconv.Itoa(int(current_year-1)) + current_time[5:7] + current_time[8:10])
Since we want an int
in the end, what do you think of using the time.Year
, time.Month
, and time.Day
functions directly rather than converting to / from ascii strings?
unembargo.go, line 69 at r3 (raw file):
// The date is used as prefixFileName in format sidestream/yyyy/mm/dd func UnEmbargoOneDayLegacyFiles(sourceBucket string, destBucket string, prefixFileName string) bool {
Please return actual error
values. Returning an actual err
value gives the caller more flexibility. If all they want is success / fail, then the caller can check err != nil
. And, if the caller wants to know more they have the option of inspecting the error further.
unembargo.go, line 81 at r3 (raw file):
for { destinationFiles := unembargoService.Objects.List(destBucket) if destPageToken != "" {
Is this condition necessary?
unembargo.go, line 101 at r3 (raw file):
// Copy files. pageToken := "" for {
This function is pretty long. Is there a way to divide it into parts?
unembargo.go, line 118 at r3 (raw file):
result := unembargoService.Objects.Delete(destBucket, oneItem.Name).Do() if result != nil { fmt.Printf("Objects deletion from public bucket failed: %v\n", err)
What err
value does this belong to?
unembargo.go, line 125 at r3 (raw file):
// Insert the object into destination bucket. object := &storage.Object{Name: oneItem.Name} _, err := unembargoService.Objects.Insert(destBucket, object).Media(fileContent.Body).Do()
Is ObjectsService.Copy
an option rather than Download
& Insert
?
https://godoc.org/google.golang.org/api/storage/v1#ObjectsService.Copy
unembargo.go, line 148 at r3 (raw file):
// The input date is integer in format yyyymmdd func Unembargo(date int) bool {
What will happen if I pass this function a value of zero?
Comments from Reviewable
Review status: 3 of 5 files reviewed at latest revision, 8 unresolved discussions.
unembargo.go, line 44 at r3 (raw file):
Please use `log` for reporting errors (or Greg found `glog` recently https://github.com/golang/glog which looks even better).
Done.
unembargo.go, line 61 at r3 (raw file):
Since we want an `int` in the end, what do you think of using the `time.Year`, `time.Month`, and `time.Day` functions directly rather than converting to / from ascii strings?
It is possible, but won't make code more simply:
time.Year 10000 + time.Month 100 + time.Day
Comments from Reviewable
Review status: 3 of 5 files reviewed at latest revision, 8 unresolved discussions.
unembargo.go, line 69 at r3 (raw file):
Please return actual `error` values. Returning an actual `err` value gives the caller more flexibility. If all they want is success / fail, then the caller can check `err != nil`. And, if the caller wants to know more they have the option of inspecting the error further.
Done.
unembargo.go, line 81 at r3 (raw file):
Is this condition necessary?
Given there are < 1000 files for sidestream everyday, we do not need this.
Comments from Reviewable
Review status: 3 of 5 files reviewed at latest revision, 8 unresolved discussions.
unembargo.go, line 118 at r3 (raw file):
What `err` value does this belong to?
Done.
unembargo.go, line 148 at r3 (raw file):
What will happen if I pass this function a value of zero?
I will add a TODO to it.
Comments from Reviewable
Review status: 3 of 5 files reviewed at latest revision, 5 unresolved discussions.
unembargo.go, line 101 at r3 (raw file):
This function is pretty long. Is there a way to divide it into parts?
Done.
Comments from Reviewable
Review status: 3 of 5 files reviewed at latest revision, 5 unresolved discussions.
unembargo.go, line 125 at r3 (raw file):
Is `ObjectsService.Copy` an option rather than `Download` & `Insert`? https://godoc.org/google.golang.org/api/storage/v1#ObjectsService.Copy
I tried:
if oneItem.Name != "" { object := &storage.Object{Name: oneItem.Name} _, err := unembargoService.Objects.Copy(sourceBucket, oneItem.Name, destBucket, oneItem.Name, object).Do() if err != nil { log.Printf("Objects copy failed: %v\n", err) return err } }
But keep get errors like:
Objects copy failed: googleapi: Error 400: Required, required
Not sure the exact reason, but I will stick with Download & Insert ways.
Comments from Reviewable
Review status: 3 of 5 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.
unembargo.go, line 61 at r3 (raw file):
It is possible, but won't make code more simply: time.Year * 10000 + time.Month * 100 + time.Day
The time.Year * 10000 + time.Month * 100 + time.Day
expression is a single line and does not require converting from int to string back to int. Please use int
s to calculate the cutoff_date
.
unembargo.go, line 81 at r3 (raw file):
Given there are < 1000 files for sidestream everyday, we do not need this.
Setting the PageToken
when the initial value is empty should be a no-op. So, I believe this check is not necessary and removing the extra condition will simplify the logic. Please remove the condition unless this breaks functionality.
unembargo.go, line 125 at r3 (raw file):
I tried: if oneItem.Name != "" { object := &storage.Object{Name: oneItem.Name} _, err := unembargoService.Objects.Copy(sourceBucket, oneItem.Name, destBucket, oneItem.Name, object).Do() if err != nil { log.Printf("Objects copy failed: %v\n", err) return err } } But keep get errors like: Objects copy failed: googleapi: Error 400: Required, required Not sure the exact reason, but I will stick with Download & Insert ways.
Hmm, that's too bad. The 400 error gives hope that it may be an issue that we can figure out. A Copy
seems much more preferable to a Download and Upload, for performance and IO reasons.
Is there time to investigate? If not, please create a github issue to investigate this. (We will be running this code continuously in the future right?)
unembargo.go, line 148 at r3 (raw file):
I will add a TODO to it.
Please associate a name with the TODO.
unembargo_test.go, line 31 at r4 (raw file):
UploadFile(privateBucket, "testdata/20160102T000000Z-mlab3-sin01-sidestream-0000.tgz", "sidestream/2016/01/02/") DeleteFiles(publicBucket, "") if Unembargo(20160102) == nil {
Rather than a positive condition here, use a negative condition. This way if it fails, the corresponding error message is right next to the condition that failed. And, the logic below does not need to be indented as far.
unembargo_test.go, line 33 at r4 (raw file):
if Unembargo(20160102) == nil { // Check the privateBucket does not have that file any more fileNames := GetFileNamesFromBucket(privateBucket)
Rather than fileNames
and fileNames2
, what about privateNames
and publicNames
?
unembargo_test.go, line 34 at r4 (raw file):
// Check the privateBucket does not have that file any more fileNames := GetFileNamesFromBucket(privateBucket) for _, fileName := range fileNames {
At this point, the private bucket should contain no files. And, at most could contain only one file.
It would be more concise to check the fileNames
map for the explicit name expected. e.g.
if _, ok := fileNames["sidestream/2016/01/02/20160102T000000Z-mlab3-sin01-sidestream-0000.tgz"]; !ok {
t.Errorf("The embargoed copy should be deleted after the process.\n")
}
Or, simply asserting that the length of len(fileNames) != 0
should be sufficient.
unembargo_test.go, line 41 at r4 (raw file):
// Check the publicBucket has that file fileNames2 := GetFileNamesFromBucket(publicBucket) for _, fileName2 := range fileNames2 {
Here too; there should be exactly one file, and we know exactly what it's name should be.
Also, I suggest using a negative condition, so the error is right next to the condition.
It's unusual to return from a test function before the end of the function. A successful test should fall through to the end.
Comments from Reviewable
Review status: 3 of 5 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.
unembargo.go, line 61 at r3 (raw file):
The `time.Year * 10000 + time.Month * 100 + time.Day` expression is a single line and does not require converting from int to string back to int. Please use `int`s to calculate the `cutoff_date`.
Done.
Comments from Reviewable
Review status: 3 of 5 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.
unembargo.go, line 81 at r3 (raw file):
Setting the `PageToken` when the initial value is empty should be a no-op. So, I believe this check is not necessary and removing the extra condition will simplify the logic. Please remove the condition unless this breaks functionality.
Done.
unembargo.go, line 125 at r3 (raw file):
Hmm, that's too bad. The 400 error gives hope that it may be an issue that we can figure out. A `Copy` seems much more preferable to a Download and Upload, for performance and IO reasons. Is there time to investigate? If not, please create a github issue to investigate this. (We will be running this code continuously in the future right?)
Add a TODO for it.
Comments from Reviewable
Review status: 3 of 5 files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.
unembargo_test.go, line 31 at r4 (raw file):
Rather than a positive condition here, use a negative condition. This way if it fails, the corresponding error message is right next to the condition that failed. And, the logic below does not need to be indented as far.
Done.
unembargo_test.go, line 33 at r4 (raw file):
Rather than `fileNames` and `fileNames2`, what about `privateNames` and `publicNames`?
Done.
unembargo_test.go, line 34 at r4 (raw file):
At this point, the private bucket should contain no files. And, at most could contain only one file. It would be more concise to check the `fileNames` map for the explicit name expected. e.g. ``` if _, ok := fileNames["sidestream/2016/01/02/20160102T000000Z-mlab3-sin01-sidestream-0000.tgz"]; !ok { t.Errorf("The embargoed copy should be deleted after the process.\n") } ``` Or, simply asserting that the length of `len(fileNames) != 0` should be sufficient.
Done.
unembargo_test.go, line 41 at r4 (raw file):
Here too; there should be exactly one file, and we know exactly what it's name should be. Also, I suggest using a negative condition, so the error is right next to the condition. It's unusual to return from a test function before the end of the function. A successful test should fall through to the end.
Done.
Comments from Reviewable
Hi, Ya,
I see that there are new files in testdata, but none of the current tests use them. Is that expected?
Best, Stephen
Review status: 1 of 13 files reviewed at latest revision, 7 unresolved discussions.
unembargo.go, line 81 at r3 (raw file):
Done.
I think there has been a misunderstanding.
Using NextPageToken
to iterate through all files is necessary anytime we call List
, since we may have days with more than 1000 files, or for special test cases, or maybe GCS changes how many entries they return per request, right?
The "condition" I meant was, rather than:
if destPageToken != "" {
destinationFiles.PageToken(destPageToken)
}
It should be sufficient to say (without the if
):
destinationFiles.PageToken(destPageToken)
unembargo.go, line 125 at r3 (raw file):
Add a TODO for it.
Who is assigned to the TODO?
Will we be running this code continuously in the future? The efficiency of a Copy will be much higher than a Download and Upload.
unembargo.go, line 148 at r3 (raw file):
Please associate a name with the TODO.
Who is assigned to the TODO?
unembargo.go, line 55 at r6 (raw file):
var ( privateBucket = "" publicBucket = ""
To make the unembargo interface more "object oriented" and rather than using global variables to pass parameters to the Unembargo
function, what do you think of defining a simple struct to hold these values and defining Unembargo
as a receiver on that struct type?
Client code might look like:
config := embargo.NewConfig(privateBucketName, publicBucketName)
config.Unembargo(date)
unembargo.go, line 62 at r6 (raw file):
func CheckWhetherUnembargo(date int) bool { current_time := time.Now() cutoff_date := (int(current_time.Year())-1)*10000 + int(current_time.Month())*100 + int(current_time.Day())
Please only use type casts when necessary. Like this: https://play.golang.org/p/W1dPo1IBvX
Also, I'm sorry to have missed it before; please use use mixed case variable names rather than underscores. This is the convention in Go https://golang.org/doc/effective_go.html#mixed-caps
unembargo.go, line 108 at r6 (raw file):
sourceFiles := unembargoService.Objects.List(sourceBucket) sourceFiles.Prefix(prefixFileName) if pageToken != "" {
Here too, if pageToken != "" {
should not be needed.
unembargo_test.go, line 24 at r6 (raw file):
// This end to end test require anthentication. func TestUnembargoLegacy(t *testing.T) { privateBucket = "mlab-embargoed-data"
Since these are real buckets, I think it would be helpful and maybe safer if they included strings that made it clear that they're for "unit test" data and not production data.
These names look pretty generic and I can easily imagine them being confused for buckets for real data.
Comments from Reviewable
Which file? I see all test files used at least once.
Review status: 1 of 13 files reviewed at latest revision, 7 unresolved discussions.
Comments from Reviewable
Review status: 1 of 13 files reviewed at latest revision, 7 unresolved discussions.
unembargo.go, line 148 at r3 (raw file):
Who is assigned to the TODO?
Done.
unembargo_test.go, line 24 at r6 (raw file):
Since these are real buckets, I think it would be helpful and maybe safer if they included strings that made it clear that they're for "unit test" data and not production data. These names look pretty generic and I can easily imagine them being confused for buckets for real data.
Done.
Comments from Reviewable
Review status: 1 of 13 files reviewed at latest revision, 7 unresolved discussions.
unembargo.go, line 108 at r6 (raw file):
Here too, `if pageToken != "" {` should not be needed.
Done.
Comments from Reviewable
Review status: 1 of 13 files reviewed at latest revision, 7 unresolved discussions.
unembargo.go, line 81 at r3 (raw file):
I think there has been a misunderstanding. Using `NextPageToken` to iterate through all files is necessary anytime we call `List`, since we may have days with more than 1000 files, or for special test cases, or maybe GCS changes how many entries they return per request, right? The "condition" I meant was, rather than: ``` if destPageToken != "" { destinationFiles.PageToken(destPageToken) } ``` It should be sufficient to say (without the `if`): ``` destinationFiles.PageToken(destPageToken) ```
Done.
unembargo.go, line 62 at r6 (raw file):
Please only use type casts when necessary. Like this: https://play.golang.org/p/W1dPo1IBvX Also, I'm sorry to have missed it before; please use use mixed case variable names rather than underscores. This is the convention in Go https://golang.org/doc/effective_go.html#mixed-caps
If I do not use type casts, there will be errors like:
mismatched types int and time.Month
Comments from Reviewable
This PR has been messed up by rebase. I will address all comments in a new branch. Then close this one.
Review status: 1 of 13 files reviewed at latest revision, 7 unresolved discussions.
Comments from Reviewable
Review status: 1 of 13 files reviewed at latest revision, 7 unresolved discussions.
unembargo.go, line 55 at r6 (raw file):
To make the unembargo interface more "object oriented" and rather than using global variables to pass parameters to the `Unembargo` function, what do you think of defining a simple struct to hold these values and defining `Unembargo` as a receiver on that struct type? Client code might look like: ``` config := embargo.NewConfig(privateBucketName, publicBucketName) config.Unembargo(date) ```
Done.
Comments from Reviewable
Review status: 1 of 13 files reviewed at latest revision, 7 unresolved discussions.
unembargo.go, line 125 at r3 (raw file):
Who is assigned to the TODO? Will we be running this code continuously in the future? The efficiency of a Copy will be much higher than a Download and Upload.
Done.
Comments from Reviewable
This change is