Closed jose-sherpa closed 12 months ago
If accepted I will add the tests and refactor
All modified lines are covered by tests :white_check_mark:
Comparison is base (
79a540b
) 100.00% compared to head (b26cb6d
) 100.00%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Oh and just curious @jf-tech , have you thought about updating the go version? From my experience it's completely backwards compatible and has had some good improvements on the performance side and new methods
@jose-sherpa
1) if the desired effect is just to change the output from the command line tool, not the actual output format (which is currently standard JSON) from the Transform.Read()
interface, then I would be against putting the setting into the schema. It seems to be more of a command line option that would be the better place for the switch. What do you think?
Yes, maybe in the future, we actually want to introduce different output formats for the transform interface, offering callers the choice of the current JSON, and maybe protobuf, or XML, or something else. At that time, we need to introduce the proper `ParserSettings` option and an extensible and pluggable mechanism for hooking up various writers. But doesn't seem to be the case in your request.
2) As for go version, we pinned our go version to 1.14 against the underlying goja (the javascript engine version) at that time because that was the version we thoroughly tested against. Now most likely upgrading to a higher version is fine, but that might unknowingly break other dependent customers who might still be running on a lower go version. So our philosophy is unless we start to use some go features only available in a higher version, we will stick with the current one, well, until sometime in the future it's too antiquated or Google simply stops supporting it.
Thoughts? And again, thanks for chiming in and contributing!! Love it!
@jose-sherpa
- if the desired effect is just to change the output from the command line tool, not the actual output format (which is currently standard JSON) from the
Transform.Read()
interface, then I would be against putting the setting into the schema. It seems to be more of a command line option that would be the better place for the switch. What do you think? Yes, maybe in the future, we actually want to introduce different output formats for the transform interface, offering callers the choice of the current JSON, and maybe protobuf, or XML, or something else. At that time, we need to introduce the properParserSettings
option and an extensible and pluggable mechanism for hooking up various writers. But doesn't seem to be the case in your request.- As for go version, we pinned our go version to 1.14 against the underlying goja (the javascript engine version) at that time because that was the version we thoroughly tested against. Now most likely upgrading to a higher version is fine, but that might unknowingly break other dependent customers who might still be running on a lower go version. So our philosophy is unless we start to use some go features only available in a higher version, we will stick with the current one, well, until sometime in the future it's too antiquated or Google simply stops supporting it.
Thoughts? And again, thanks for chiming in and contributing!! Love it!
Thanks for the prompt reply! I think it makes sense to have it as a command line option, I'll make that change!
@jf-tech I'll make the changes and add tests, thank you!
@jf-tech I'm looking through the tests and I think this should be in extensions/omniv21/samples
however when looking in there it seems to use SampleTestCommon
which does it's own logic for making the json output. Is there a different test that tests through the command? Or should I make it so the output logic are in different functions that SampleTestCommon
can use?
Maybe something like this will allow multiple places to use the same writing that the transform command uses:
package writer
import (
"github.com/jf-tech/go-corelib/jsons"
"github.com/jf-tech/go-corelib/strs"
"github.com/jf-tech/omniparser"
"io"
"strings"
)
type Writer interface {
Write(printf func(string, ...any) (int, error), println func(...any) (int, error)) error
}
func NewJSON(transform omniparser.Transform) Writer {
return &writer{
LParen: "[\n%s",
Delim: ",\n%s",
RParen: "\n]",
Empty: "[]",
DoOne: func() (string, error) {
b, err := transform.Read()
if err != nil {
return "", err
}
return strings.Join(
strs.NoErrMapSlice(
strings.Split(jsons.BPJ(string(b)), "\n"),
func(s string) string { return "\t" + s }),
"\n"), nil
},
}
}
func NewNDJSON(transform omniparser.Transform) Writer {
return &writer{
LParen: "%s",
Delim: "\n%s",
RParen: "",
Empty: "",
DoOne: func() (string, error) {
b, err := transform.Read()
if err != nil {
return "", err
}
return string(b), nil
},
}
}
type writer struct {
LParen string
Delim string
RParen string
Empty string
DoOne func() (string, error)
}
func (w *writer) Write(printf func(string, ...any) (int, error), println func(...any) (int, error)) error {
record, err := w.DoOne()
if err == io.EOF {
_, _ = println(w.Empty)
return nil
}
if err != nil {
return err
}
_, _ = printf(w.LParen, record)
for {
record, err = w.DoOne()
if err == io.EOF {
break
}
if err != nil {
return err
}
_, _ = printf(w.Delim, record)
}
_, _ = println(w.RParen)
return nil
}
which reduces doTransform
to:
func doTransform() error {
schemaName := filepath.Base(schema)
schemaReadCloser, err := openFile("schema", schema)
if err != nil {
return err
}
defer schemaReadCloser.Close()
inputReadCloser := io.ReadCloser(nil)
inputName := ""
if strs.IsStrNonBlank(input) {
inputName = filepath.Base(input)
inputReadCloser, err = openFile("input", input)
if err != nil {
return err
}
defer inputReadCloser.Close()
} else {
inputName = "(stdin)"
inputReadCloser = os.Stdin
// Note we don't defer Close() on this since os/golang runtime owns it.
}
schema, err := omniparser.NewSchema(schemaName, schemaReadCloser)
if err != nil {
return err
}
transform, err := schema.NewTransform(inputName, inputReadCloser, &transformctx.Ctx{})
if err != nil {
return err
}
var w writer.Writer
if ndjson {
w = writer.NewNDJSON(transform)
} else {
w = writer.NewJSON(transform)
}
return w.Write(fmt.Printf, fmt.Println)
}
@jose-sherpa sorry for the delay. We were out/offline for several days.
First I'm not seeing the updated PR. Did you push your updates (based on my feedbacks) to origin?
Second, I just checked, the dir cli
actually has no tests, because originally the purpose of it to create a quick cli tool for interactive schema authoring and a HTTP endpoint for playground (which retired last year).
So I would say don't worry about the tests for changes in this directory. Just push your latest updates to the PR and we'll take a look again and approve.
@jf-tech hey sorry I was out of town, yeah I made the changes you suggested unless you see I missed something. The only commit I did not push was the commit containing the changes I outlined in my previous comment. If you like those changes let me know and I can push that commit
While the omniparser tool outputs JSON format currently, you will often need another tool or package to stream the JSON output. While I am aware this tool will only be used for JSON output, there is a type of JSON called NDJSON which stands for new line delimited JSON. This makes it easy to stream parse and process a JSON array with no added packages or complexity since you just read each line and parse them one by one. Since a strength of omniparser is to stream parse large files, we think it makes sense to make the output easily streamable without violating the output of JSON. It also results in a smaller file size.
http://ndjson.org/