gordonwatts / BDTTrainingAnalysisLanguage

Pull from ATLAS EXOT 15 Derivation, columnar data, and flat rootutples with RDF to scikitlearn in one nice fast swoop
0 stars 2 forks source link

DataSet needs to be normalized #56

Open gordonwatts opened 5 years ago

gordonwatts commented 5 years ago

Think through how we will specify dataset types. Here is the first go at different places we might source data from:

And then there may be new sources eventually. Perhaps there are two classes of data set source:

File-based requires special handling: the interface for all file types should be uniform. For example:

    data = DataSet('name')

Where the name says if it is on the grid, if it should be run there, or if it is local, or anything. In the past, I've used a URI to specify this, as there is already a standard and it is possible to add parameters in a well-understood way (and that libraries are already written to parse!).