Open mlr0p opened 4 years ago
Expanding on this, the reason this is necessary is there is a lot of code reuse between each importer, and as such it can be quite daunting to write a new importer/fix issues. Having some sort of generic importer/importing method should both reduce code reuse and lower the barrier to entry for newcomers.
Golang has some form of interfaces/inheritance (https://golang.org/doc/effective_go.html#embedding), so we could use that for structuring an importer. In terms of how this should work on a technical level, here's what I propose:
importers/util/common.go
This file should be something of the form:
package util
type LineParser interface {
ParseLine(line string) ([]interface{}, err)
EstimateCount(line string) (int64, err)
}
type Importer struct {
parser LineParser
bar *pb.ProgressBar
numThreads int
threader chan string
doner chan bool
mongo *mgo.Session
verbose bool
fileName string
// ... other variables that it needs
}
func MakeImporter(parser LineParser, verbose bool /*, other variables... */) *Importer {
// basically just do what the main funcs of the current importers do, but in here
// should just initialise everything, but not run the main loop (yet)
// should set up the progress bar too (if verbose enabled)
// creating the progress bar should call parser.EstimateCount(line) to get an estimate of how many creds on that line
}
func (i *Importer) Run() {
// this part should have the threader <- r.ReadLine() loop and the <- doner loop
}
func (i *Importer) importLine() {
// basically just copy paste the current importLine() functionality, but call i.parser.ParseLine(line) instead for the parsing (and handle any errors)
}
importers/importer-(sql-)template.go
These two files should show how easy the new system will be. They should be somewhat of the form:
import (
"github.com/zxsecurity/steamer/importers/util"
)
type GenericData struct {
Id bson.ObjectId `json:"id" bson:"_id,omitempty"`
MemberID int `bson:"memberid"`
Email string `bson:"email"`
Liame string `bson:"liame"`
PasswordHash string `bson:"passwordhash"`
Password string `bson:"password"`
Breach string `bson:"breach"`
}
type TemplateLineParser struct {}
func (t TemplateLineParser) ParseLine(line string) ([]interface{}, err) {
data := make([]GenericData, 0)
// code to parse a line into its data blobs
return data
}
func (t TemplateLineParser) EstimateCount(line string) (int64, err) {
// code to estimate how many pieces of data are in a line (for the progress bar)
}
func main() {
parser := TemplateLineParser{}
// other setup code ...
importer := util.MakeImporter(parser /* , other args ... */)
importer.Run()
}
The plan to close this issue:
One pull request should be made for 1. and 2. and a separate one for 3.
It would be beneficial to have a generic importer (or at least approximates to generic ) that can parse and import arbitary dump in any format.