kundajelab / basepairmodels

MIT License
16 stars 6 forks source link

[suggestion] Remove --stranded and --has-control flags; infer these from input_data.json #14

Open mmtrebuchet opened 3 years ago

mmtrebuchet commented 3 years ago

This is just a suggestion, but I wanted to log it some place where we have a formal channel to discuss. Currently, in the input_data.json file, the user must provide control tracks, and also must indicate the strandiness of the data. Then, she must also invoke --stranded and --has-control in a whole slew of scripts. I propose we just rely on input_data.json to infer the strandiness and controlledness of the data. This also would allow for a clean way to have datasets of mixed strandednesses and controllednesses. For example, `{ "task_nanog_plus" : {"strand" : 0, "task_id" : 0, "signal": [...], "peaks" : [...], "control" : [...]} "task_nanog_minus" : {"strand" : 1, "task_id" : 0, [signal, peaks, control]} "task_mnase" : { "strand" : 0 "task_id" : 1, [signal, peaks]} //control omitted } would be a valid input. In the case of mixed strandednesses, it would construct a model with the appropriate number of outputs (2*n_stranded_inputs + n_unstranded_inputs) and then either (1.) the model would only expect the number of control tracks listed in the input or (2.) the model would expect a control track for every output, but the code would supply bias tracks full of zeroes in cases where the user has not provided one.

I need this sort of functionality because I'm mixing and matching all sorts of data types, some stranded, some controlled, and some both stranded and controlled.

Your thoughts on this proposal?

`