The current idea is to only utilize one type of ChIP-seq data to analyze "enhancer activity", but it would be useful if multiple data sets could be used in the same way. The challenge would be ensuring unique, yet easily identifiable INFO fields for each to print to output.
Computationally, the most efficient way to do this would be to allow users to provide multiple datasets at the command line for a common argument:
-d dataset1.bed dataset2.bed dataset3.bed or -d dataset1 -d dataset2 or such. Not sure which is more straightforward to implement in python. The backend processing would be the same for each, but would need to dynamically name INFO fields, perhaps based on a set piece of the filename. For example:
-d K27AC.data.bed FAIRE.data.bed would be split by '.' and use the first element as a prefix to the INFO field. K27ACZ: (z-scores for each sample with variant); FAIREZ: (z-scores for each sample with variant).
I'm inclined to shelve this for the time being, as I don't feel it's as necessary/helpful as I originally thought. Potentially worth coming back to in the future.
The current idea is to only utilize one type of ChIP-seq data to analyze "enhancer activity", but it would be useful if multiple data sets could be used in the same way. The challenge would be ensuring unique, yet easily identifiable INFO fields for each to print to output.
Computationally, the most efficient way to do this would be to allow users to provide multiple datasets at the command line for a common argument:
-d dataset1.bed dataset2.bed dataset3.bed
or-d dataset1 -d dataset2
or such. Not sure which is more straightforward to implement in python. The backend processing would be the same for each, but would need to dynamically name INFO fields, perhaps based on a set piece of the filename. For example:-d K27AC.data.bed FAIRE.data.bed
would be split by '.' and use the first element as a prefix to the INFO field.K27ACZ: (z-scores for each sample with variant); FAIREZ: (z-scores for each sample with variant)
.