Closed yachang closed 7 years ago
Review status: 0 of 8 files reviewed at latest revision, 13 unresolved discussions.
embargo.go, line 44 at r4 (raw file):
// Split one tar files into 2 buffers. func SplitFile(content io.Reader) (bool, bytes.Buffer, bytes.Buffer) {
Return an Error instead of a bool, and make it the last return value, consistent with other Go functions?
embargo.go, line 45 at r4 (raw file):
// Split one tar files into 2 buffers. func SplitFile(content io.Reader) (bool, bytes.Buffer, bytes.Buffer) { var privateBuf bytes.Buffer
Maybe call this "embargoBuf"?
embargo.go, line 54 at r4 (raw file):
} defer zipReader.Close() unzippedImage, err := ioutil.ReadAll(zipReader)
Maybe unzippedBytes ?
embargo.go, line 67 at r4 (raw file):
publicTw := tar.NewWriter(publicGzw) // Handle one tar file
This comment is confusing. Is it handling one inner file from the tar file, or a top level tar file, or something else?
embargo.go, line 88 at r4 (raw file):
// put this file to a private buffer if err := privateTw.WriteHeader(hdr); err != nil { fmt.Println(err)
This prints to the console. Is that what you want here? Or should it go to a log?
embargo.go, line 109 at r4 (raw file):
if err := publicTw.Close(); err != nil { fmt.Println(err) return false, privateBuf, publicBuf
In general, I think it is much better to return the error, and decide what to do with it in the caller. This makes testing easier and more robust, and gives the caller more control.
embargo.go, line 140 at r4 (raw file):
// Embargo do embargo ckecking to all files in the sourceBucket. func Embargo() bool {
I think we need a TODO here. If the embargo job crashes half way through, what should happen when it starts up again?
embargo_test.go, line 5 at r4 (raw file):
import ( "bytes" "fmt"
Run gofmt?
embargo_test.go, line 12 at r4 (raw file):
/* // End to end test, requires authentication.
Please add a clear TODO() here.
embargo_test.go, line 25 at r4 (raw file):
*/ func TestSplitTarFile(t *testing.T) {
I can figure out from context what is in the test files, but it would be better to state it explicitly in some combination of comment describing the test, and inline comments. If you define const strings for the filenames, the comments will be easy to read. E.g. "combined_file contains ... and ... . The test verifies that SplitFile correctly separates this into ... and ..., which ..."
An alternative (I think cleaner) way to test this would be to create a tar file on the fly, with lists of inner files, call SplitFile on it, then verify that the pub and private buffers contain the correct filenames.
embargo_test.go, line 39 at r4 (raw file):
t.Error("Did not perform embargo ocrrectly.\n") } publicGolden, err := os.Open("testdata/20170315T000000Z-mlab3-sea03-sidestream-0000-p.tgz")
Does this need
embargo_test.go, line 44 at r4 (raw file):
} defer publicGolden.Close() publicContent, err := ioutil.ReadAll(publicGolden)
Either check err value, or use _
embargo_test.go, line 56 at r4 (raw file):
} defer privateGolden.Close() privateContent, err := ioutil.ReadAll(privateGolden)
Check err or use _
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 21 unresolved discussions.
embargo.go, line 45 at r4 (raw file):
embargoBuf Oops. I meant s/privateBuf/embargoBuf/
However, the misunderstanding raises another question - which functions need to be exported, and which can be local to this file?
embargo.go, line 67 at r4 (raw file):
This comment is confusing. Is it handling one inner file from the tar file, or a top level tar file, or something else?
Still need a comment, it just should be clearer. 8-)
embargo.go, line 109 at r4 (raw file):
In general, I think it is much better to return the error, and decide what to do with it in the caller. This makes testing easier and more robust, and gives the caller more control.
Reminder - please "Done" each comment that you have completed.
embargo.go, line 10 at r5 (raw file):
"io" "io/ioutil" "log"
FYI, I just discovered glog, which we will probably want to move to in the very near future, but this is fine for now.
embargo.go, line 27 at r5 (raw file):
// Write results to GCS. func WriteResults(tarfileName string, service *storage.Service, privateBuf, publicBuf bytes.Buffer) bool {
Here and elsewhere, we should return err instead of bool to indicate whether something was successful.
embargo.go, line 30 at r5 (raw file):
privateTarfileName := strings.Replace(tarfileName, ".tgz", "-e.tgz", -1) publicObject := &storage.Object{Name: tarfileName} privateObject := &storage.Object{Name: privateTarfileName}
s/privateObject/embargoObject/
embargo.go, line 62 at r5 (raw file):
tarReader := tar.NewReader(unzippedReader) privateGzw := gzip.NewWriter(&privateBuf)
embargoGzw ?
embargo.go, line 64 at r5 (raw file):
privateGzw := gzip.NewWriter(&privateBuf) publicGzw := gzip.NewWriter(&publicBuf) privateTw := tar.NewWriter(privateGzw)
embargoTw
embargo.go, line 125 at r5 (raw file):
} // EmbargoOneTar process one tar file, split it to 2 files, the embargoed files
s/process/processes/ s/split/splits
embargo.go, line 127 at r5 (raw file):
// EmbargoOneTar process one tar file, split it to 2 files, the embargoed files // will be saved in a private dir, and the unembargoed part will be save in a // public dir.
s/dir/bucket/ ???
embargo.go, line 142 at r5 (raw file):
// TODO: handle midway crash. Since the source bucket is unchanged, if it failed // in the middle, we just rerun it for that specific day. func EmbargoOneDayData(date string) bool {
Prefer to return an error, not a bool.
embargo_test.go, line 12 at r5 (raw file):
// End to end test, requires authentication. // TODO: enbale it on Travis.
typo.
gcs_operations.go, line 49 at r5 (raw file):
// Create a new bucket. Return true if it already exsits or is created successfully. func CreateBucket(projectID string, bucketName string) bool { service := CreateService()
It is highly likely that creating a client and storage.New() are both heavyweight and perform requests to GCS backends. Unless you know otherwise, you should create an object that caches the client and the service, and reuses them for all GCS requests. Once.Do might be useful. https://golang.org/pkg/sync/#Once
I would cache the client and service in a Singleton object, then use these to get the buckets you need in
BTW, if you have some Go embargo code already running (e.g. to transform the old embargo data), you could pretty easily profile it to see how time is being spent. I would expect these to show up as places where the code is blocked waiting for the remote requests to complete. But I would just pro-actively change this to cache the client and service.
Since (IIUC) you are just calling CreateService once from embargo.go for each call to EmbargoOneDayData, I think that is fine for now. But you should add TODOs to each of these go_operations functions that is calling CreateService, as they will (likely) be quite inefficient.
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 21 unresolved discussions.
embargo.go, line 10 at r5 (raw file):
FYI, I just discovered glog, which we will probably want to move to in the very near future, but this is fine for now.
Done.
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 20 unresolved discussions.
embargo.go, line 27 at r5 (raw file):
Here and elsewhere, we should return err instead of bool to indicate whether something was successful.
Done.
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 20 unresolved discussions.
embargo.go, line 44 at r4 (raw file):
Return an Error instead of a bool, and make it the last return value, consistent with other Go functions?
Done.
embargo.go, line 45 at r4 (raw file):
> embargoBuf Oops. I meant s/privateBuf/embargoBuf/ However, the misunderstanding raises another question - which functions need to be exported, and which can be local to this file?
I am not sure. Some use the gcs service defined in other files in this package.
embargo.go, line 30 at r5 (raw file):
s/privateObject/embargoObject/
Done.
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 21 unresolved discussions.
embargo.go, line 54 at r4 (raw file):
Maybe unzippedBytes ?
Done.
embargo.go, line 67 at r4 (raw file):
Still need a comment, it just should be clearer. 8-)
Done.
embargo.go, line 88 at r4 (raw file):
This prints to the console. Is that what you want here? Or should it go to a log?
Done.
embargo.go, line 109 at r4 (raw file):
Reminder - please "Done" each comment that you have completed.
Done.
embargo.go, line 140 at r4 (raw file):
I think we need a TODO here. If the embargo job crashes half way through, what should happen when it starts up again?
Done.
embargo.go, line 62 at r5 (raw file):
embargoGzw ?
Done.
embargo.go, line 64 at r5 (raw file):
embargoTw
Done.
embargo.go, line 125 at r5 (raw file):
s/process/processes/ s/split/splits
Done.
embargo.go, line 127 at r5 (raw file):
s/dir/bucket/ ???
Done.
embargo.go, line 142 at r5 (raw file):
Prefer to return an error, not a bool.
Done.
embargo_test.go, line 5 at r4 (raw file):
Run gofmt?
Done.
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 20 unresolved discussions.
embargo_test.go, line 12 at r4 (raw file):
Please add a clear TODO() here.
Done.
embargo_test.go, line 25 at r4 (raw file):
I can figure out from context what is in the test files, but it would be better to state it explicitly in some combination of comment describing the test, and inline comments. If you define const strings for the filenames, the comments will be easy to read. E.g. "combined_file contains ... and ... . The test verifies that SplitFile correctly separates this into ... and ..., which ..." An alternative (I think cleaner) way to test this would be to create a tar file on the fly, with lists of inner files, call SplitFile on it, then verify that the pub and private buffers contain the correct filenames.
Comments added. Add a TODO for the better way.
embargo_test.go, line 44 at r4 (raw file):
Either check err value, or use _
Done.
embargo_test.go, line 56 at r4 (raw file):
Check err or use _
Done.
embargo_test.go, line 12 at r5 (raw file):
typo.
Done.
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 20 unresolved discussions.
gcs_operations.go, line 49 at r5 (raw file):
It is highly likely that creating a client and storage.New() are both heavyweight and perform requests to GCS backends. Unless you know otherwise, you should create an object that caches the client and the service, and reuses them for all GCS requests. Once.Do might be useful. https://golang.org/pkg/sync/#Once I would cache the client and service in a Singleton object, then use these to get the buckets you need in BTW, if you have some Go embargo code already running (e.g. to transform the old embargo data), you could pretty easily profile it to see how time is being spent. I would expect these to show up as places where the code is blocked waiting for the remote requests to complete. But I would just pro-actively change this to cache the client and service. Since (IIUC) you are just calling CreateService once from embargo.go for each call to EmbargoOneDayData, I think that is fine for now. But you should add TODOs to each of these go_operations functions that is calling CreateService, as they will (likely) be quite inefficient.
Done.
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 10 unresolved discussions.
embargo.go, line 46 at r6 (raw file):
// Split one tar files into 2 buffers. func embargoBuf(content io.Reader) (bytes.Buffer, bytes.Buffer, error) {
SplitFile() was fine. Please change back?
embargo.go, line 133 at r6 (raw file):
// The private file will have a different name, so it can be copied to public // bucket directly when it becomes one year old. func EmbargoOneTar(content io.Reader, tarfileName string, service *storage.Service) bool {
return error ?
embargo.go, line 134 at r6 (raw file):
// bucket directly when it becomes one year old. func EmbargoOneTar(content io.Reader, tarfileName string, service *storage.Service) bool { embargoBuf, publicBuf, err := embargoBuf(content)
Use the standard golang error idiom here:
if embargoBuf, publicBuf, err := embargoBuf(content); err != nil { return err } if WriteResults(...) != nil { return err }
return nil
embargo.go, line 155 at r6 (raw file):
log.SetOutput(f) // TODO: Create service in a Singleton object, and reuses them for all GCS requests.
s/resuses/reuse/
embargo_check.go, line 98 at r6 (raw file):
} fn := FileName{name: fileName} localIP := fn.GetLocalIP()
Consider having GetLocalIP return an error if the IP address is missing or unparseable.
embargo_test.go, line 26 at r6 (raw file):
} // This test verified that func embargoBuf() correctly split the input tar
s/verified/verifies/
Make all test comments active - describing what the test does anytime you run it. Past tense implies that you ran the test once but might never run it again.
gcs_operations.go, line 49 at r5 (raw file):
Done.
Maybe some misunderstanding... embargo.go is fine now, but you should also add the same TODO for every function here that does service := CreateService().
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 10 unresolved discussions.
embargo.go, line 46 at r6 (raw file):
SplitFile() was fine. Please change back?
Done.
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 10 unresolved discussions.
embargo.go, line 133 at r6 (raw file):
return error ?
Done.
embargo.go, line 134 at r6 (raw file):
Use the standard golang error idiom here: if embargoBuf, publicBuf, err := embargoBuf(content); err != nil { return err } if WriteResults(...) != nil { return err } return nil
If this way, embargoBuf, publicBuf will be local var, and cannot be used later.
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 10 unresolved discussions.
embargo.go, line 155 at r6 (raw file):
s/resuses/reuse/
Done.
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 10 unresolved discussions.
embargo_check.go, line 98 at r6 (raw file):
Consider having GetLocalIP return an error if the IP address is missing or unparseable.
Currently it return empty string when something is wrong.
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 10 unresolved discussions.
embargo_test.go, line 26 at r6 (raw file):
s/verified/verifies/ Make all test comments active - describing what the test does anytime you run it. Past tense implies that you ran the test once but might never run it again.
Done.
gcs_operations.go, line 49 at r5 (raw file):
Maybe some misunderstanding... embargo.go is fine now, but you should also add the same TODO for every function here that does service := CreateService().
Done.
Comments from Reviewable
embargo.go, line 134 at r6 (raw file):
If this way, embargoBuf, publicBuf will be local var, and cannot be used later.
Ah. In that case:
embargoBuf, publicBuf, err := embargoBuf(content) if err != nil { return err } if err = WriteResults(...); err != nil { return err }
return nil
The main thing is to prefer early return, on error, and reaching the end of the function should indicate success. https://blog.golang.org/error-handling-and-go
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 4 unresolved discussions.
embargo.go, line 155 at r6 (raw file):
Done.
Still see "reuses"
embargo_check.go, line 98 at r6 (raw file):
Currently it return empty string when something is wrong.
Yes. An error is just more explicit and clear. Ok to leave as is though.
gcs_operations.go, line 49 at r5 (raw file):
Done.
Hmm. Not showing up in the PR. Did you commit it?
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions.
embargo.go, line 134 at r6 (raw file):
Ah. In that case: embargoBuf, publicBuf, err := embargoBuf(content) if err != nil { return err } if err = WriteResults(...); err != nil { return err } return nil The main thing is to prefer early return, on error, and reaching the end of the function should indicate success. https://blog.golang.org/error-handling-and-go
Done.
embargo.go, line 155 at r6 (raw file):
Still see "reuses"
Done.
embargo_check.go, line 98 at r6 (raw file):
Yes. An error is just more explicit and clear. Ok to leave as is though.
For old filename format w/o IP address, it is not an error, just we cannot get anything.
gcs_operations.go, line 49 at r5 (raw file):
Hmm. Not showing up in the PR. Did you commit it?
Yes. In gcs_operations.go and embargo_check.go
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 1 unresolved discussion.
embargo.go, line 185 at r7 (raw file):
return err } result := EmbargoOneTar(fileContent.Body, oneItem.Name, embargoService)
s/result/err/
Comments from Reviewable
Review status: 0 of 9 files reviewed at latest revision, 1 unresolved discussion.
embargo.go, line 185 at r7 (raw file):
s/result/err/
Done.
Comments from Reviewable
LGTM once you resolve new comments.
A high level comment for future - think about whether the functions you are writing would fit logically into one or two "classes" - i.e. structs to hold the state with functions operating on the structs. I find it very helpful to start by writing all the function signatures, then group them, and see if there is some implicit or explicit state that would form a coherence class. Then fill in the function bodies after you see how everything fits together.
Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions.
embargo.go, line 71 at r8 (raw file):
// Handle the small files inside one tar file. for { header, err := tarReader.Next()
Probably should be checking header.Typeflag to ensure it is a normal file.
embargo.go, line 80 at r8 (raw file):
} basename := filepath.Base(header.Name) info := header.FileInfo()
Could some of these assignments be done through tar.FileInfoHeader()? Or through struct assignment?
Comments from Reviewable
One more comment - please add a TODO in the travis file to set up coverage, and add badges to the readme file.
travis TODO added, and README updated.
Review status: 0 of 9 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.
embargo.go, line 71 at r8 (raw file):
Probably should be checking header.Typeflag to ensure it is a normal file.
Done.
embargo.go, line 80 at r8 (raw file):
Could some of these assignments be done through tar.FileInfoHeader()? Or through struct assignment?
Yes, we can construct hdr in fly during using like this:
&tar.Header{ Name: header.Name, Mode:int64(info.Mode()), ModTime: info.ModTime(), Size: info.Size(), }
, but hdr will be used twice below, so I would prefer to construct it earlier to avoid duplication.
Comments from Reviewable
embargo.go, line 87 at r8 (raw file):
hdr.ModTime = info.ModTime() output, err := ioutil.ReadAll(tarReader) if strings.Contains(basename, "web100") && embargoCheck.ShouldEmbargo(basename) {
Incorporate the "web100" check within the ShouldEmbargo function.
Comments from Reviewable
embargo.go, line 86 at r8 (raw file):
hdr.Mode = int64(info.Mode()) hdr.ModTime = info.ModTime() output, err := ioutil.ReadAll(tarReader)
This read causes a memory allocation and copy, which is wasteful if you then don't use it. So I would suggest you move this after the ShouldEmbargo check.
Comments from Reviewable
embargo.go, line 86 at r8 (raw file):
This read causes a memory allocation and copy, which is wasteful if you then don't use it. So I would suggest you move this after the ShouldEmbargo check.
D'oh. Never mind.
Comments from Reviewable
Review status: 0 of 10 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.
embargo.go, line 87 at r8 (raw file):
Incorporate the "web100" check within the ShouldEmbargo function.
Done.
Comments from Reviewable
embargo.go, line 80 at r8 (raw file):
Yes, we can construct hdr in fly during using like this: &tar.Header{ Name: header.Name, Mode:int64(info.Mode()), ModTime: info.ModTime(), Size: info.Size(), } , but hdr will be used twice below, so I would prefer to construct it earlier to avoid duplication.
Discussed offline.
Comments from Reviewable
Comments from Reviewable
Embargo cls, the third: do the tar file processing, split into 2 and save at different places.
This change is